FRESH

Hacker News

Home

The 70% AI productivity myth: why most companies aren't seeing the gains

91 points by chtefi

by orwin

3 subcomments

If we take out most of frontend work, and the easy backend/Ops tasks where writing the code/config is 99% of the work, i think my overall productivity with the latest gen (basically Opus 4.5) improve by 15-20%. I also am _very_ sure that with the previous generation (Sonnet 4, sonnet 4.5, Codex 5.1), my team overall velocity decreased, even taking into account the frontend and the "easy" tasks. The amount of production bug we had to deal with this year is crazy. To much code is generated, and me and the other senior on my team just can't carefully review everything, we have to trust sometime (especially data structures).
The worse part is reading a PR, and catching a reintroduced bug that was fixed a few commit ago. The first time i almost lost my cool at work and said a negative thing to a coworker.
This would be my advice to juniors (and i mean basically: devs who don't yet understand the underlying business/architecture): use the AI to explain how stuff work, generate basic functions maybe, but write code logic/algorithm yourself until you are sure you understand what you're doing and why. Work and reflect on the data structures by yourself, even if generated by the AI, and ask for alternatives. Always ask for alternatives, it helps understanding. You might not see huge productivity gains from AI, but you will improve first, and then productivity will improve very fast, from your brain first, then from AI.

by fancyfredbot

3 subcomments

The METR study cited here is very interesting.
"In the METR study, developers predicted AI would make them 24% faster before starting. After finishing 19% slower, they still believed they'd been 20% faster."
I hadn't heard of this study before. Seems like it's been mentioned on HN before but not got much traction.

by everdrive

4 subcomments

A lot of the time, AI allows you to exercise basic competence at tasks for which you'd otherwise be incompetent. I think this is why it feels so powerful. You can jump into more or less any task below a certain level of complexity. (eg: you're not going to write an operating system with an LLM but you can set up and configure Wordpress if you'd never done it before.)
I think for users this _feels_ incredibly powerful, however this also has its own pitfalls: Any topic which you're incompetent at is one which you're also unequipped to successfully review.
I think there are some other productivity pitfalls for LLMs:
- Employees use it to give their boss emails / summaries / etc in the language and style their boss wants. This makes their boss happy, but doesn't actually modify productivity whatsoever since the exercise was a waste of time in the first place.
- Employees send more emails, and summarize more emails. They look busier, but they're not actually writing the emails or really reading them. The email volume has increased, however the emails themselves were probably a waste of time in the first place.
- There is more work to review all around and much of it is of poor quality.
I think these issues play a smaller part than some of the general issues raised (eg: poor quality code / lack of code reviews / etc.) but are still worth noting.

by mashlol

4 subcomments

AI almost always reduces the time from "I need to implement this feature" to "there is some code that implements this feature".
However in my experience, the issue with AI is the potential hidden cost down the road. We either have to:
1. Code review the AI generated code line by line to ensure it's exactly what you'd have produced yourself when it is generated or
2. Pay an unknown amount of tech tebt down the road when it inevitably wasn't what you'd have done yourself and it isn't extensible, scalable, well written code.

by ezoe

3 subcomments

If anyone ever wonder why they don't see productivity improvement, they really need to read Mythical Man-Month.
Garage Duo can out-compete corporate because there is less overhead. But Garage Duo can't possibly output the sheer amount of work matching with corporate.

by jaredcwhite

1 subcomments

If people need AI assistance to handle all their "boilerplate" all the time, the much larger problem is needing so much damn boilerplate written all the time.
The job of anyone developing an application framework, whether that's off the shelf or in-house, is to reduce the amount of boilerplate any individual developer needs to write to an absolute bare minimum. The ultimate win isn't to get "AI to write all your boilerplate." It's to not need to write boilerplate at all.

by zihotki

0 subcomment

Sounds like AI slopish article. A whole section about "Why most enterprises don't" with many words but no actual data or analysis. Just assumptions based on orthogonal report.
AI won't give you much productivity if the problem you're challenged with is the human problem. That could happen both to startups and enterprises.

by chiengineer

1 subcomments

Lets give 99% of the company devices with 16gb of ram or less and force them to use 85% of it for security scans
- corporate
WHY CANT OUR DEVICES RUN TECHNOLOGIES ??????
- also corporate

by hiyer

0 subcomment

In my experience, this kind of productivity can only be seen in small startups or, if in a large company, on an entirely new product line when processes like test coverage, reviews, etc are lax. In large firms and existing code-bases, it can take weeks to even get the approach decided. Even once decided, any pull request larger than a few dozen lines will get shot down. Things are even worse if you're working across time zones because the to-and-fro on pull requests takes several days to be completed.

by turlockmike

1 subcomments

When producing code is cheap, you can spend more time on verification testing.
Force the LLM to follow a workflow, have it do TDD, use task lists, have it write implementation plans.
LLMs are great coders, but subpar developers, help them be a good developer and you will see massive returns.

by mgrat

0 subcomment

I've worked at a number of non-tech companies the past few years. They bought every SaaS product, Palantir, Databricks, multi-cloud, their dev teams adopted every pattern popularized by big tech and the results were always mixed. Any gains were wiped out by being buried under technical debt. They had all the data catalogs & 'ontologies' with none of the governance to go make it work. Turns out that benefiting from all this tech requires you to re-organize and change your culture. For a lot of companies, they're just not going to see big gains from AI or tech in general at this point.

by bulletsvshumans

0 subcomment

I think coding agents require fundamentally different development practices in order to produce efficiency improvements. And just like any new tool, they benefit from wisdom in how they are applied, which we are just starting to develop as an industry. I expect that over time we will grow to understand and also expand the circumstances in which they are a net benefit, while also appreciating where they are a hindrance, leading to an overall efficiency increase as we avoid the productivity hit resulting from their misapplication.

by neilwilson

0 subcomment

What the AI speed increase on Greenfield projects with modern stacks does do is reduce the cost of replacement.
Expect to see more “replace rather than repair” projects springing up

by keeda

0 subcomment

TFA is directionally correct, though it repeats a few cliches which are no longer accurate. E.g. people and some empirical data report improved productivity even on large, brownfield codebases, with the caveat that effectiveness seems to be related more to the quality processes around the code rather than the code itself.
However, this TFA is absolutely correct about the point that it takes a long time to master this technology.
A second, related point is that the users have to adapt themselves to the technology to fully harness it. This is the hardest part. As an example, after writing OO code for my entire career, I use a much more of a functional programming style these days because that's what gets the best results from AI for me.
In fact, if you look at how the most effective users of AI agents do coding, it is nothing like what we are used to. It's more like a microcosm of all the activities that happen around coding -- planning, research, discussions, design, testing, review, etc -- rather than the coding itself. The closest analogy I can think of is the workstyle of senior / staff engineers working with junior team members.
Similarly, organizations will have to rethink their workflows and processes from the ground-up to fully leverage AI. As a trivial example, tasks that used to take days and meetings can now take minutes, but will require much more careful review. So we need support for the humans-in-the-loop to do this efficiently and effectively, e.g. being able to quickly access all the inputs that went into the AI's work product, and spot-check them or run custom validations. This kind of infra would be specific to each type of task and doesn't exist yet and needs to be built.
Just foisting a chatbot on employees is not helpful at all, especially as a top-down mandate with no guidance or training or dedicated time to experiment AND empowerment to shake things up. Without that you will mostly get poor results and resentment against AI, which we are already seeing.
It's only 3 years since ChatGPT was released, so it is still very early days. Given how slow most organizations move, I'm actually surprised that any of them are reporting positive results this early.

by nen-nomad

1 subcomments

Claude Code with Opus models has definitely reduced our TTM. It took us some time to build processes around it. It freed our resources to focus on tasks such as crafting better user journeys and marketing plans.
One thing I am not sure about is the debt we are accumulating by allowing AI agents to write and maintain the code. In the short term, it is boosting our speed, but in the long run, we may suffer.
But the product works well, and our users are happy with the experience.
I have been a programmer for three long decades, so I have mixed feelings about this. But some days I see the writing on the wall.

by sys_64738

1 subcomments

AI code output is garbage which needs to be combed through by real programmers. That's the problem. AI will dumb down the quality of what a real programmer is.

by KevinMS

0 subcomment

Have people figured out that coding isn't the hard part yet?

by nineteen999

0 subcomment

Think this is a stupid point in time to be measuring any changes in productivity with anything other than wide-eyed interest. The tools may have improved a lot over the past year but they are still embryonic.
Way too early to be jumping to any conclusions about this IMHO.

by aisisiiaai

1 subcomments

A key point missing from a lot of the AI debate is how much work is useless. From as simple as a feature that’s never turned on to a more extreme version of a job that doesn’t need to exist.
We have a lot of useless work being done, and AI is absolutely going to be a 10x speed up for this kind of work.

by mattas

1 subcomments

In my experience, it’s basically impossible to accurately measure productivity of knowledge work. Whenever I see a stat associated to productivity gain/loss I get skeptical.
If you go the pure subjective route, I’ve found that people conflate “speed” or “productivity” with “ease.”

by scuff3d

0 subcomment

Something not mentioned by the article is that the gains seen by "AI native startups" and green field projects are going to be paid for later. Anyone who's worked in software for a bit will tell you that last 20% of a project is a bitch, and it's gonna be worse when you don't understand how anything actually works.
Interestingly, I've worked both ends of the spectrum simultaneously over the last year. I've spent most of my time on a (mostly) legacy system we're adding capabilities too, and I've spent some over time working on an R&D project for my company. In the first, AI had been of limited use. Mostly good for generating helper scripts and data generators, stuff where I don't care and just need a couple hundred lines of code. In the R&D project on the other hand we probably got a years worth of work done in 2 months, but I can already see the problems. We are working in a space none of us are experts in and with a complex library we don't understand. AI got us to a demo of an MVP way quicker then we could have ourself, but actually transitioning that to something useful is going to be a LOT of work.

by pdyc

0 subcomment

i think developers "feel" like it because it reduces cognitive overhead but does not necessarily leads to more output in given time. Than there are areas like scaffolding new project, repetitive work which does leads to real gain but that would not lead to dramatic changes in productivity.

by ukuina

0 subcomment

This article simply reinforces existing (and outdated) biases.
Complex legacy refactoring + Systems with poor documentation or unusual patterns + Architectural decisions requiring deep context: These go hand in hand. LLMs are really good at pulling these older systems apart, documenting, then refactoring them, tests and all. Exacerbated by poor documentation of domain expectations. Get your experts in a room weekly and record their rambling ideas and history of the system. Synthesize with an LLM against existing codebase. You'll get to 80% system comprehension in a matter of months.
Novel problem-solving with high stakes: This is the true bottleneck, and where engineers can shine. Risk assessment and recombination of ideas, with rapid prototyping.

by jennyholzer3

5 subcomments

I think you'd have to be stupid to expect productivity gains from your software developers using LLMs
edit: a lot of articles like this have been popping up recently to say "LLMs aren't as good as we hyped them up to be, but they still increase developer productivity by 10-15%".
I think that is a big lie.
I do not think LLMs have been shown to increase developer productivity in any capacity.
Frankly, I think LLMs drastically degrade developer performance.
LLMs make people stupider.

by linsomniac

1 subcomments

>The AI fluency tax. This isn't free to learn.
In programming we've often embraced spending time to learn new tools. The AI tools are just another set of tools, and they're rapidly changing as well.
I've been experimenting seriously with the tools for ~3 years now, and I'm still learning a lot about their use. Just this past weekend I started using a whole new workflow, and it one-shotted building a PWA that implements a fully-featured calorie tracking app (with social features, pre-populating foods from online databases, weight tracking and graphing, avatars, it's on par with many I've used in the past that cost $30+/year).
Someone just starting out at chat.openai.com isn't going to get close to this. You absolutely have to spend time learning the tooling for it to be at all effective.

by josefritzishere

0 subcomment

I think AI would have better general acceptance if we stopped mythologizing it's utility. It's so wildly over exaggerated it can't ever live up to the hype. If AI can't adapt to a reality-based universe, the bubble is going to burst all the sooner.

by tstrimple

0 subcomment

I work in consulting and have a good look at how a few very large players are trying to implement GenAI. They literally have zero clue what they are doing providing gimped access to sandboxed models approved by legal. So many are just using Copilot that they are already paying for to some degree. Out of the dozen or so large organizations adopting GenAI that I'm at least tangentially involved in one way or another, not a single one is using tools like Codex or Claude Code for development.

by nospice

2 subcomments

Another day, another evidently AI-written article about AI on the front page of HN...

by robomartin

0 subcomment

The key issue is that the current version of AI has no concept of understanding anything. Without understanding anything is possible and bad outcomes are almost guaranteed outside of the trivial. Throw a non-trivial codebase at any AI tool and watch as it utterly destroys it, introduces lots of new bugs, add massive amounts of bloat and, in general, makes it incomprehensible and impossible to support.
I ran a three month experiment with two of our projects, one Django and the other embedded C and ARM assembler. You start with "oh wow, that's cool!" and not too long after that you end up in hell. I used both ChatGPT and Cursor for this.
The only way to use LLMs effectively was to carefully select small chunks of code to work on, have it write the code and then manually integrate into the codebase after carefully checking it and ensuring it didn't want to destroy 10 other files. It other words, use a very tight leash.
I'm about to run a six month LLM experiment now. This time it will be Verilog FPGA code (starting with an existing project). We'll see how that goes.
My conclusion at this instant in time is that LLMs are useful if you are knowledgeable and capable in the domain they are being applied to. If you are not, shit show potential is high.