The loss of competency seems pretty obvious but it's good to have data. What is also interesting to me is that the AI assisted group accomplished the task a bit faster but it wasn't statistically significant. Which seems to align with other findings that AI can make you 'feel' like you're working faster but that perception isn't always matched by the reality. So you're trading learning and eroding competency for a productivity boost which isn't always there.
Now, imagine a scenario of a typical SWE in todays or maybe not-so-distant future: the agents build your software, you simply a gate-keeper/prompt engineer, all tests pass, you're now doing a production deployment at 12am and something happens but your agents are down. At that point, what do you do if you haven't build or even deployed the system? You're like a L1 support at this point, pretty useless and clueless when it comes to fully understanding and supporting the application .
The models are too good now. One thing I've noticed recently is that I've stopped dreaming about tough problems, be it code or math. The greatest feeling in the world is pounding your head against a problem for a couple of days and waking up the next morning with the solution sketched out in your mind.
I don't think the solution is to be going full natty with things, but to work more alongside the code in an editor, rather than doing things in CLI.
Ouch.
See also: https://news.ycombinator.com/item?id=46820924
> On average, participants in the AI group finished about two minutes faster, although the difference was not statistically significant. There was, however, a significant difference in test scores: the AI group averaged 50% on the quiz, compared to 67% in the hand-coding group
Common example here is learning a language. Say, you learn French or Spanish throughout your school years or on Duolingo. But unless you're lucky enough to be amazing with language skills, if you don't actually use it, you will hit a wall eventually. And similarly if you stop using language that you already know - it will slowly degrade over time.
Personally, I’ve never been learning software development concepts faster—but that’s because I’ve been offloading actual development to other people for years.
Previous title: "Anthropic: AI Coding shows no productivity gains; impairs skill development"
The previous title oversimplified the claim to "all" developers. I found the previous title meaningful while submitting this post because most of the false AI claims of "software engineer is finished" has mostly affected junior `inexperienced` engineers. But I think `junior inexperienced` was implicit which many people didn't pick.
The paper makes a more nuanced claim that AI Coding speeds up work for inexperienced developers, leading to some productivity gains at the cost of actual skill development.
I don't necessarily think that writing more code means you get better coder. I automate nearly all my tests with AI and large chunk of bugfixing as well. I will regularly ask AI to propose an architecture or introduce a new pattern if I don't have a goal in my mind. But in these last 2 examples, I will always redesign the entire approach to be what I consider a better, cleaner interface. I don't recall AI ever getting that right, but must admit I asked AI in the first place cos I didn't know where to start.
If I had to summarize, I would say to let AI implement coding, but not API design/architecture. But at the same time, you can only get good at those by knowing what doesn't work and trying to find a better solution.
> Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.
The library in question was Python trio and the model they used was GPT-4o.
1. AI help produced a solution only 2m faster, and
2. AI help reduced retention of skill by 17%
Also my mastery of code starts with design and implementation that results in deep, intuitive understanding. Then I can do good code reviews and fix bugs fast fast fast.
Now engineers leap from AI assisted or even dominated implementation to code reviews. Lots of reading code without that deep level of mastery. With this approach I have less confidence in the humans who are in the loop.
For example I wanted to add a rate-limiter to an api call with proper http codes, etc. I asked the ai (in IntelliJ it used to be Claude by default but they've since switched to Gemini as default) to generate one for me. The first version was not good so I asked it to do it again but with some changes.
What would take me a couple of hours or more took less than 10 minutes.
I'm wondering if we could have the best of IDE/Editor features like LSP and LLMs working together. With an LSP syntax errors are a solved problem, if the language is statically typed I often find myself just checking out type signatures of library methods, simpler to me than asking an LLM. But I would love to have LLMs fixing your syntax and with types available or not, giving suggestions on how to best use the libraries given current context.
Cursor tab does that to some extent but it's not fool proof and it still feels too "statistical".
I'd love to have something deeply integrated with LSPs and IDE features, for example VSCode alone has the ability of suggesting imports, Cursor tries to complete them statistically but it often suggest the wrong import path. I'd like to have the twos working together.
Another example is renaming identifiers with F2, it is reliable and predictable, can't say the same when asking an agent doing that. On the other hand if the pattern isn't predictable, e.g. a migration where a 1 to 1 rename isn't enough, but needs to find a pattern, LLMs are just great. So I'd love to have an F2 feature augmented with LLMs capabilities
> Importantly, using AI assistance didn’t guarantee a lower score. How someone used AI influenced how much information they retained. The participants who showed stronger mastery used AI assistance not just to produce code but to build comprehension while doing so—whether by asking follow-up questions, requesting explanations, or posing conceptual questions while coding independently.
This might be cynically taken as cope, but it matches my own experience. A poor analogy until I find a better one: I don't do arithmetic in my head anymore, it's enough for me to know that 12038 x 912 is in the neighborhood of 10M, if the calculator gives me an answer much different from that then I know something went wrong. In the same way, I'm not writing many for loops by hand anymore but I know how the code works at a high level and how I want to change it.
(We're building Brokk to nudge users in this direction and not a magic "Claude take the wheel" button; link in bio.)
This similarly indicates that reliance on LLM correlates with degraded performance in critical problem-solving, coding and debugging skills. On the bright side, using LLMs as a supplementary learning aid (e.g. clarifying doubts) showed no negative impact on critical skills.
This is why I'm skeptical of people excited about "AI native" junior employees coming in and revamping the workplace. I haven't yet seen any evidence that AI can be effectively harnessed without some domain expertise, and I'm seeing mounting evidence that relying too much on it hinders building that expertise.
I think those who wish to become experts in a domain would willingly eschew using AI in their chosen discipline until they've "built the muscles."
Like the architecture work and making good quality specs, working on code has a guiding effect on the coding agents. So in a way, it also benefits to clarify items that may be more ambiguous in the spec. If I write some of the code myself, it will make fewer assumptions about my intent when it touches it (especially when I didn't specify them in the architecture or if they are difficult to articulate in natural language).
In small iterations, the agent checks back for each task. Because I spend a lot of time on architecture, I already have a model in my mind of how small code snippets and feature will connect.
Maybe my comfort with reviewing AI code comes form spending a large chunk of my life reverse engineering human code, to understand it to the extent that complex bugs and vulnerabilities emerge. I've spent a lot of time with different styles of code writing from awful to "this programmer must have a permanent line to god to do this so elegantly". The models is train on that, so I have a little cluster of neurons in my head that's shaped closely enough to follow the model's shape.
> This suggests that as companies transition to more AI code writing with human supervision, humans may not possess the necessary skills to validate and debug AI-written code if their skill formation was inhibited by using AI in the first place.
I'm reminded of "Kernighan's lever" :
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?
AI is writing code in the cleverest way possible which then introduces cognitive load for anyone who hasn't encountered these patterns previously. Although, one might say that AI would also assist in the debugging, you run the risk of adding further complexity in the process of 'fixing' the bugs and before you know it you have a big stinking ball of mud.
I found that Claude wasn't too great at first at it and returned a lot of hallucinated methods or methods that existed in Pandas but not Polars. I chalk this up to context blurring and that there's probably a lot less Polars code in the training corpus.
I found it most useful for quickly pointing me to the right documentation, where I'd learn the right implementation and then use it. It was terrible for the code, but helpful as a glorified doc search.
I use a web ui to chat with ai and do research, and even then I sometimes have to give up and accept that it won't provide the best solution that I know exists and am just to lazy to flesh out on my own. And to the official docs I go.
But the coding tools, I'm sorry but they constantly disappoint me. Especially the agents. In fact the agents fucking scare me. Thank god copilot prompts me before running a terminal command. The other day I asked it about a cypress test function and the agent asked if it could run some completely unrelated gibberish python code in my terminal. That's just one of many weird things it's done.
My colleagues vibe code things because they don't have experience in the tech we use on our project, it gets passed to me to review with "I hope you understand this". Our manager doesn't care because he's all in on AI and just wants the project to meet deadlines because he's scared for his job, and each level up the org chart from him it's the same. If this is what software development is now then I need to find another career because its pathetic, boring, and stressful for anyone with integrity.
I think being intentional about learning while using AI to be productive is where the stitch is, at least for folks earlier in their career. I touch that in my post here as well: https://www.shayon.dev/post/2026/19/software-engineering-whe...
This is my experience exactly. I have never been learning as much as with AI.
It's interesting that numbers show most users degrade but I hate the general assumption that some cannot use it properly to learn faster as well.
[1] https://martinfowler.com/articles/llm-learning-loop.html
(thankfully market dynamics and OSS alternatives will probably stop this but it's not a guarantee, you need like at least six viable firms before you usually see competitive behavior)
The three high score patterns are interesting as well. "Conceptual Inquiry" actually results in less time and doesn't improve the score than the other two, which is quite surprising to me.
[1] plug: this is a video about the Patreon community I founded to do exactly that. Just want to make sure you’re aware that’s the pitch before you do ahead and watch.
Submission about the arXiv pre-print: https://news.ycombinator.com/item?id=46821360
If you use it with the express intent to learn, it is an amazing tool.
If you use it as a crutch, it results in "learning avoidance".
This is one reason I've been resistant to using it. I don't want my work to go to the companies providing the models. I don't trust them. Not only with my data in the first place, but also that they'll keep providing the service over the long term without totally enshittifying the experience.
I'll be so much more excited by this when local models catch up to (or even exceed) frontier-level quality. How close are we to this?
(In my case, I don't even care if it costs a boatload in hardware capital to deploy.)
Theuth: "This invention, O king, will make the Egyptians wiser and will improve their memories; for it is an elixir of memory and wisdom that I have discovered."
Thamus replied: "Most ingenious Theuth, one man has the ability to beget arts, but the ability to judge of their usefulness or harmfulness to their users belongs to another; and now you, who are the father of letters, have been led by your affection to ascribe to them a power the opposite of that which they really possess. For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them.
You have discovered an elixir not of memory but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise."
Which is to say: "All this has happened before, and will happen again."
I can start to see the dangers of ai now, whereas before it was more imaginary sci-fi stuff I couldn’t pin down. On the other hand a dystopian sci-fi full of smart everything seems more possible now since code can be whipped up so easily, which means perhaps that the ability for your smart-monocle to find and hack things in every day life is also way more likely now if the world around you is saturated by quick and insecure code.
Sure, it sounds good to call for more regulation, or admit that there are downsides to your product, but when you know these things are falling largely on deaf ears and you continue operating business as usual, I wonder how much of it is just theater.
TL;DR it's not AI that makes you dumb, it's the wrong "Output style" - just choose learning style.
This study is so bad, the sample size is n = 52 and then in some conclusions it goes down to n = 2.
One of the things I worry about is people not even learning what they can ask the computer to do properly because they don't understand the underlying system well enough.
One of my little pet peeves, especially since I do a lot of work in the networking space, is code that works with strings instead of streams. For example, it is not that difficult (with proper languages and libraries) to write an HTTP POST handler that will accept a multi-gigabyte file and upload it to an S3 bucket, perhaps gzip'ing it along the way, such that any size file can be uploaded without reference to the RAM on the machine, by streaming it rather than loading the entire file into a string on upload, then uploading that file to S3, requiring massive amounts of RAM in the middle. There's still a lot of people and code out in the world that works that way. AIs are learning from all that code. The mass of not-very-well-written code can overwhelm the good stuff.
And that's just one example. A whole bunch of stuff that proliferates across a code base like that and you get yet another layer of sloppiness that chews through hardware and negates yet another few generations of hardware advances.
Another thing is that, at the moment, code that is good for an AI is also good for a human. They may not quite be 100% the same but right now they're still largely in sync. (And if we are wise, we will work to keep it that way, which is another conversation, and we probably won't because we aren't going to be this wise at scale, which is yet another conversation.) I do a lot of little things like use little types to maintain invariants in my code [1]. This is good for humans, and good for AIs. The advantages of strong typing still work for AIs as well. Yet none of the AIs I've used seem to use this technique, even with a code base in context that uses this techique extensively, nor are they very good at it, at least in my experience. They almost never spontaneously realize they need a new type, and whenever they go to refactor one of these things they utterly annihilate all the utility of the type in the process, completely blind to the concept of invariants. Not only do they tend to code in typeless goo, they'll even turn well-typed code back into goo if you let them. And the AIs are not so amazing that they overcome the problems even so.
(The way these vibe coded code bases tend to become typeless formless goo as you scale your vibe coding up is one of the reasons why vibe coding doesn't scale up as well as it initially seems to. It's good goo, it's neat goo, it is no sarcasm really amazing that it can spew this goo at several lines per second, but it's still goo and if you need something stronger than goo you have problems. There are times when this is perfect; I'm just about to go spray some goo myself for doing some benchmarking where I just need some data generated. But not everything can be solved that way.)
And who is going to learn to shepherd them through writing better code, if nobody understands these principles anymore?
I started this post with an "if" statement, which wraps the whole rest of the body. Maybe AIs will advance to the point where they're really good at this, maybe better than humans, and it'll be OK that humans lose understanding of this. However, we remain a ways away from this. And even if we get there, it may yet be more years away than we'd like; 10, 15 years of accreting this sort of goo in our code bases and when the AIs that actually can clean this up get here they may have quite a hard time with what their predecessors left behind.
[1]: https://jerf.org/iri/post/2025/fp_lessons_types_as_assertion...
>We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.