The worse part is reading a PR, and catching a reintroduced bug that was fixed a few commit ago. The first time i almost lost my cool at work and said a negative thing to a coworker.
This would be my advice to juniors (and i mean basically: devs who don't yet understand the underlying business/architecture): use the AI to explain how stuff work, generate basic functions maybe, but write code logic/algorithm yourself until you are sure you understand what you're doing and why. Work and reflect on the data structures by yourself, even if generated by the AI, and ask for alternatives. Always ask for alternatives, it helps understanding. You might not see huge productivity gains from AI, but you will improve first, and then productivity will improve very fast, from your brain first, then from AI.
"In the METR study, developers predicted AI would make them 24% faster before starting. After finishing 19% slower, they still believed they'd been 20% faster."
I hadn't heard of this study before. Seems like it's been mentioned on HN before but not got much traction.
I think for users this _feels_ incredibly powerful, however this also has its own pitfalls: Any topic which you're incompetent at is one which you're also unequipped to successfully review.
I think there are some other productivity pitfalls for LLMs:
- Employees use it to give their boss emails / summaries / etc in the language and style their boss wants. This makes their boss happy, but doesn't actually modify productivity whatsoever since the exercise was a waste of time in the first place.
- Employees send more emails, and summarize more emails. They look busier, but they're not actually writing the emails or really reading them. The email volume has increased, however the emails themselves were probably a waste of time in the first place.
- There is more work to review all around and much of it is of poor quality.
I think these issues play a smaller part than some of the general issues raised (eg: poor quality code / lack of code reviews / etc.) but are still worth noting.
However in my experience, the issue with AI is the potential hidden cost down the road. We either have to:
1. Code review the AI generated code line by line to ensure it's exactly what you'd have produced yourself when it is generated or
2. Pay an unknown amount of tech tebt down the road when it inevitably wasn't what you'd have done yourself and it isn't extensible, scalable, well written code.
Garage Duo can out-compete corporate because there is less overhead. But Garage Duo can't possibly output the sheer amount of work matching with corporate.
The job of anyone developing an application framework, whether that's off the shelf or in-house, is to reduce the amount of boilerplate any individual developer needs to write to an absolute bare minimum. The ultimate win isn't to get "AI to write all your boilerplate." It's to not need to write boilerplate at all.
AI won't give you much productivity if the problem you're challenged with is the human problem. That could happen both to startups and enterprises.
- corporate
WHY CANT OUR DEVICES RUN TECHNOLOGIES ??????
- also corporate
Force the LLM to follow a workflow, have it do TDD, use task lists, have it write implementation plans.
LLMs are great coders, but subpar developers, help them be a good developer and you will see massive returns.
Expect to see more “replace rather than repair” projects springing up
However, this TFA is absolutely correct about the point that it takes a long time to master this technology.
A second, related point is that the users have to adapt themselves to the technology to fully harness it. This is the hardest part. As an example, after writing OO code for my entire career, I use a much more of a functional programming style these days because that's what gets the best results from AI for me.
In fact, if you look at how the most effective users of AI agents do coding, it is nothing like what we are used to. It's more like a microcosm of all the activities that happen around coding -- planning, research, discussions, design, testing, review, etc -- rather than the coding itself. The closest analogy I can think of is the workstyle of senior / staff engineers working with junior team members.
Similarly, organizations will have to rethink their workflows and processes from the ground-up to fully leverage AI. As a trivial example, tasks that used to take days and meetings can now take minutes, but will require much more careful review. So we need support for the humans-in-the-loop to do this efficiently and effectively, e.g. being able to quickly access all the inputs that went into the AI's work product, and spot-check them or run custom validations. This kind of infra would be specific to each type of task and doesn't exist yet and needs to be built.
Just foisting a chatbot on employees is not helpful at all, especially as a top-down mandate with no guidance or training or dedicated time to experiment AND empowerment to shake things up. Without that you will mostly get poor results and resentment against AI, which we are already seeing.
It's only 3 years since ChatGPT was released, so it is still very early days. Given how slow most organizations move, I'm actually surprised that any of them are reporting positive results this early.
One thing I am not sure about is the debt we are accumulating by allowing AI agents to write and maintain the code. In the short term, it is boosting our speed, but in the long run, we may suffer.
But the product works well, and our users are happy with the experience.
I have been a programmer for three long decades, so I have mixed feelings about this. But some days I see the writing on the wall.
Way too early to be jumping to any conclusions about this IMHO.
We have a lot of useless work being done, and AI is absolutely going to be a 10x speed up for this kind of work.
If you go the pure subjective route, I’ve found that people conflate “speed” or “productivity” with “ease.”
Interestingly, I've worked both ends of the spectrum simultaneously over the last year. I've spent most of my time on a (mostly) legacy system we're adding capabilities too, and I've spent some over time working on an R&D project for my company. In the first, AI had been of limited use. Mostly good for generating helper scripts and data generators, stuff where I don't care and just need a couple hundred lines of code. In the R&D project on the other hand we probably got a years worth of work done in 2 months, but I can already see the problems. We are working in a space none of us are experts in and with a complex library we don't understand. AI got us to a demo of an MVP way quicker then we could have ourself, but actually transitioning that to something useful is going to be a LOT of work.
Complex legacy refactoring + Systems with poor documentation or unusual patterns + Architectural decisions requiring deep context: These go hand in hand. LLMs are really good at pulling these older systems apart, documenting, then refactoring them, tests and all. Exacerbated by poor documentation of domain expectations. Get your experts in a room weekly and record their rambling ideas and history of the system. Synthesize with an LLM against existing codebase. You'll get to 80% system comprehension in a matter of months.
Novel problem-solving with high stakes: This is the true bottleneck, and where engineers can shine. Risk assessment and recombination of ideas, with rapid prototyping.
edit: a lot of articles like this have been popping up recently to say "LLMs aren't as good as we hyped them up to be, but they still increase developer productivity by 10-15%".
I think that is a big lie.
I do not think LLMs have been shown to increase developer productivity in any capacity.
Frankly, I think LLMs drastically degrade developer performance.
LLMs make people stupider.
In programming we've often embraced spending time to learn new tools. The AI tools are just another set of tools, and they're rapidly changing as well.
I've been experimenting seriously with the tools for ~3 years now, and I'm still learning a lot about their use. Just this past weekend I started using a whole new workflow, and it one-shotted building a PWA that implements a fully-featured calorie tracking app (with social features, pre-populating foods from online databases, weight tracking and graphing, avatars, it's on par with many I've used in the past that cost $30+/year).
Someone just starting out at chat.openai.com isn't going to get close to this. You absolutely have to spend time learning the tooling for it to be at all effective.
I ran a three month experiment with two of our projects, one Django and the other embedded C and ARM assembler. You start with "oh wow, that's cool!" and not too long after that you end up in hell. I used both ChatGPT and Cursor for this.
The only way to use LLMs effectively was to carefully select small chunks of code to work on, have it write the code and then manually integrate into the codebase after carefully checking it and ensuring it didn't want to destroy 10 other files. It other words, use a very tight leash.
I'm about to run a six month LLM experiment now. This time it will be Verilog FPGA code (starting with an existing project). We'll see how that goes.
My conclusion at this instant in time is that LLMs are useful if you are knowledgeable and capable in the domain they are being applied to. If you are not, shit show potential is high.