Imagine someone in the 90s saying "if you don't master the web NOW you will be forever behind!" and yet 20 years later kids who weren't even born then are building web apps and frameworks.
Waiting for it to all shake out and "mastering" it then is still a strategy. The only thing you'll sacrifice is an AI funding lottery ticket.
He is brilliant no doubt, but not in that field.
I repeatedly rewrite prompts, restate the same constraints, and write detailed acceptance criteria, yet still end up with broken or non-functional code.its very frustrating to say the least Yesterday alone I spent about $200 on generations that now require significant manual rewrites just to make them work.
At that point, the gains are questionable. My biggest success is having the model take over the first Design in my app and I take it from there, but those hundred lines if not thousand lines of code it generates are so Messi, it's insanely painful to refactor the mess afterwards
AI code is the Canadian girlfriend of programming.
Edit: Corrected since/for. :-)
Looks like AI companies spend enough on marketing budgets to create the illusion that AI makes development better.
Let's wait one more year, and perhaps everyone who didn't fall victim to these "slimming pills” for developers' brains will be glad about the choice they made.
This agentic arms race by C-suite know-nothings feels less like leverage and more like denial. We took a stochastic text generator, noticed it lies confidently, wipes entire databases and harddrives, and responded by wrapping it in managers, sub-agents, memories, tools, permissions, workflows, and orchestration layers so we don’t have to look directly at the fact that it still doesn’t understand anything.
Now we’re expected to maintain a mental model not just of our system, but of a swarm of half-reliable interns talking to each other in a language that isn’t executable, reproducible, or stable.
Work now feels duller than dishwater, enough to have forced me to career pivot for 2026.
I've yet to see examples of folks using this in a team of 4+ folks working together in a production env with users, and just using AI for their regular development.
Claude code creator only using claude code doesn't count. That's more like dog-fooding.
This sounds unbearable. It doesn't sound like software development, it sounds like spending a thousand hours tinkering with your vim config. It reminds me of the insane patchwork of sprawl you often get in DevOps - but now brought to your local machine.
I honestly don't see the upside, or how it's supposed to make any programmer worth their weight in salt 10x better.
I've seen that these tools have different uses for different devs. I know on my current team, each of us devs works very differently to one another, and we make significant allowances to accommodate for one another's different styles. Certain tasks always go to certain devs; one dev is like a steel trap, another is the chaos explorer, another's a beginner, another has great big-picture perspective, etc. (not sure why but there's even space for myself ;)
In the same way, different devs use these powerful tools in very different ways. So don't imagine you're falling behind, because the only useful benchmark is yourself. And don't imagine you can wait for consensus: you'll still need to identify your personal relationship to the tools.
Most of all, don't be discouraged. Even if you never embrace these tools, there will remain space for your skills and your style of approaching our shared work.
Give it another 10 years and I'm sure this will all become clearer...
This chaps will continue until something moderately productive and easily adoptable comes out. FOMO will strike all of us from time to time. Some of us will even try out the latest and greatest and see if it sticks.
Some companies will mandate arbitrary code generation standards because "it's the basis of their success", it will polarize their talent pool. Later, it will be impossible to determine if they were (not) successful "inspite of" or "because of" such wild decisions.
No decades of research and massive allocation of resources over the last few years as well as very intentional decision making by tech leadership to develop this specific technology.
Nope, it just mysteriously dropped from the sky one day.
Sounds fever dreamish. Thank you sincerely (not) for creating it!
I haven't used agents much for coding, but I noticed that when I do have something created with the slightest complexity, it's never perfect and I have to go back and change it. This is mostly fine, but when large chunks of code are created, I don't have much context for editing things manually.
It's like waking up in a new house that you've never seen before. Sure I recognize the type of rooms, the furniture, the outlets, appliances, plumbing, and so on when I see them; but my sense of orientation is strained.
This is my main issue at the moment.
Using tools before their manual exists is the oldest human trick, not the newest.
Is there someone already mastering “agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering” ?
And do they have a blog?
I empathize with his sense that if we could just provide the right context and development harness to an AI model, we could be *that* much more productive, but it might just be misplaced hope. Claude Code and Cursor are probably not that far from the current frontier for LLM development environments.
Douglas Adams on age and relating to technology:
"1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.
2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.
3. Anything invented after you’re thirty-five is against the natural order of things."
From 'The Salmon of Doubt' (2002)
All of the stuff he feels he is falling behind on? Almost completely irrelevant in our domain.
At first it kind of depressed me, but now I realised that actually writing code is only part of my day job, the rest is integrating infrastructure and managing people and enabling them to do their job as well, and if I can do the coding/integration part faster and give them better tools more quickly, that's a huge win.
This means I can spend more time at the beach and on my physical and mental well being as well. I was stubborn and skeptical a year ago, but now I'm just really enjoying the process of learning new things.
He is also great at explaining AI related concepts to the masses.
However his takes on software engineering show someone that hasn’t spend a significant amount of time doing production grade software engineering, and that is perfectly fine and completely normal given his background.
But that also means that we should not take his software engineering opinions as gospel.
> Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI
Does anyone have a better way to do this other than spinning up a cloud VM to run goose or claude or whatever poorly isolated agent tool?
What are the productivity gains? Obviously, it must vary. The quality of the tool output varies based on numerous criteria, including what programming language is being used and what problem is trying to be solved. The fact that person A gets a 10x productivity increase on their project does not mean that person B will also get a 10x productivity increase on their project, no matter how well they use the tool.
But again, tool usage itself is variable. Person A themselves might get a 10x boost one time, and 8x another time, and 4x another time, and 2x another time.
It's death though to be excessively reading tweets and blogs about this stuff, this will have you exhausted before you even try a real project and comparing yourself to other people's claims which are sometimes lies, often delusional, ungrounded and almost always self-serving. In sofar someone is getting things done with any consistency they are practicing basic PM, treating feelings of exhaustion, ungroundedness and especially going in circles as a sign to regroup, slow down and focus on the end you have in mind.
If the point really is to research tools than what you do is break down that work into attainable chunks, the way you break down any other kind of work.
"Vibe programming" is less than a year old. What is programming going to look like in a few years?
The actually productive programmers, who wrote the stack that powers the economy before and after 2023 need not listen to these cheap commercials.
This confirms AI bubble for me and it now being entirely FUD driven. "Not fall behind" should only apply to technologies where you have to put active effort to learn as it requires years to hone and master the craft. AI is supposed to remove this "active effort" part so as to get you upto speed with the latest and bridge the gap between those "who know" and those "who do not". The fact you need to say "roll up your sleeves to not fall behind" confirms we are not in that situation yet.
In other words, it is the same old learning curve that everyone has to cross EXCEPT this time it is probabilistic instead of linear/exponential. It is quite literally a slightly better than coin toss situation when it comes to you learning the right way or not.
For me personally, we are truly in that zone of zero active effort and total replacement when AI can hit a 100% on ALL METRICS consistently, every single time, even on fresh datasets with challenging questions NOT SEEN/TRAINED by the model. Even better if it can come up with novel discoveries to remove any doubts. Chances of achieving that with current tech is 0%.
Anyway:
> agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations,
give me extreme Emacs 'setup' feelings: I was at a meetup in hk recently where there was someone advocating this and it was just depressing; spending hours on stuff that changes daily while just my vanilla claude code with playwright mcp runs circles around it, even after it has been set up. It is just not better at all and until someone can show that it is actually an improvement WITH the caveat that when it is an improvement on t(1), it doesn't need a complete overhaul at t(n) where n is a few days or weeks just because the hype machine says so. This measured against a vanilla CC without any added tooling except maybe playwright mcp.
People just want to scam themselves in feeling useful: if the ai does the work, then you find some way of feeling busy by adding and finetuning stuff to feel useful.
In the backend, we're mostly just pushing data around from one place to another. Not much changes, there's only a few ways to really do that. Your data structures change, but ultimately the work is the same. You don't even really need an LLM at all, or super complex frameworks and ORMs, etc.
He knows the tools, he's efficient with them and yet he just now understands how much he's unable to harness at this point that makes him feel left behind.
Looking forward to see what comes out of him climbing that slope.
And a failure to clarify the project you're currently working on and the actual results feels decidedly like a propaganda issue.
Take all the digs at my skills you want. I'd rather not be a bald faced liar.
maybe i am too ignorant and don‘t see what i am missing. and i am still writing code and enjoying it.
just the terminology of agents, vibe coding, prompt engineering etc is weirdly offputting to me.
You’re not doing it wrong, the tools just aren’t all they’re cracked up to be. They are annoying good enough to get you to waste a load of time trying to get them to do what it looks like they should be able to do.
Actually, even the post itself reads like a cognitive dissonance with a dash of the usual "if it's not working for you then you are using it wrong" defence.
Two years ago I was a human USB cable: copy, paste, pray. IDE <-> chat window, piece by piece. Now the loop is tighter. The distance is shorter.
There’s still hand-holding. Still judgment. Still cleanup. But the shift is real.
We’ve come a long way. And we’re not done.
1) These tools obviously improved significantly over the past 12 months. They can churn out code that makes sense in the context of the codebase, meaning there is more grounding to the codebase they are working on as opposed to codebases they have been trained on.
2) On the surface they are pretty good at solving known problems. You are not going to make them write well-optimized renderer or an RL algorithm but they can write run-of-the-mill business logic better _and_ faster than I can-- if you optimize for both speed of production and quality.
3) Out of the box, their personality is to just solve the problem in front of them as quickly as possible and move on. This leads them to make suboptimal decisions (e.g. solving a deadlock by sleeping for 2 seconds, CC Opus 4.5 just last night). This personality can be altered with appropriate guidance. For example, a shortcut I use is to append "idiomatic" to my request-- "come up with an idiomatic solution" or "is that the most idiomatic solution we can think of." Similarly when writing tests or reviewing tests I use "intent of the function under test" which makes the model output better solution or code.
4) These models, esp. Opus 4.5 and GPT 5.2, are remarkable bug hunters. I can point at a symptom and they come away with the bug. I then ask them to explain me why the bug happens and I follow the code to see if it's true. I have not come across a bad bug, yet. They can find deadlocks and starvations, you then have to guide them to a good fix (see #3).
5) Code quality is not sufficient to create product quality, but it is often necessary to sustain it. Sustainability window is shorter nowadays. Therefore, more than ever, quality of the code matters. I can see Claude Code slowly degrading in quality every single day--and I use it every single day for many hours. As much as it pains me to say this, compared to Opencode, Amp, and Toad I can feel the "slop" in Claude Code. I would love to study the codebases of these tools overtime to measure their quality--I know it's possible for all but Claude Code.
6) I used to worry I don't have a good mental model of the software I build. Much like journaling, I think there is something to be said about the process of writing/making actually gives you a very precise mental model. However, I have been trying to let that go and use the model as a tool to query and develop the mental model post facto. It's not the same but I think it is going to be the new norm. We need tooling in this space.
7) Despite your own experiences with these tools it is imperative that they be in your toolbox. If you have abstained from them thus far, perhaps best way to get them incorporated is by starting to use them for attending to your toil.
8) You can still handcraft code. There is so much fun, beauty and pleasure it in to deny doing it. Don't expect this to be your job. This is your passion.
Slop-oriented programming
"AI" is literally models trained to make you think it's intelligent. That's it. It's like the ultimate "algorithm" or addiction machine. It's trained to make you think it's amazing and magical and therefore you think it's amazing and magical.
I understand we are all in different camps for a multitude of reasons;
- The jouissance of rote coding and abstraction
- The tree of knowledge specifically in programming, and which branches and nodes we each currently sit at in our understanding
- Technical paradigms that humans may have argued about have now shifted to obvious answers for agentic harnesses (think something like TDD, I for one barely used that as a style because I've mostly worked in startups building apps and found the cost of my labour not worth it, but agentic harnesse loops absolutely excel at it)
- The geography and size of the markets we work in
- The complexity of the subject matter / domain expertise
- The cost prohibitive nature of token based programming (not everyone can afford it, and the big fish seemingly have quite the advantage going fourth)
- Agentic coding has proven it can build UI's very easily, and depending on experience, it can build a very very many things easily. it excels in having feedback loops such as linting or simple javascript errors, which are observability problems in my opinion. Once it can do full stack observability (APM, system, network), it's ability to reason and correct problems on the fly for any complex system seems overly easy from my purvue.
- At the human nature level, some individuals prefer to think in 0's and 1's, some in words, some inbetween, and so on, what type of communication do agentic setups prefer?
With some of that above intuition that is easily up for debate, I've decided to lean 100% into agentic coding, I think it will be absolutely everywhere and obviously with humans in the loop but I don't think humans will need to review the pull requests. I am personally treating it as an existential threat to my career after having seen enough of what it's capable of. (with some imagination and a bit of a gambling spirit, as us mere mortals surely can't predict the future)
With my gambit, I'm not choosing to exit the tech scene and instead optimistically investing my mental prowess into figuring out where "humans in the loop" will be positioned. Currently I'm looking into CI level tooling, the known being code quality, and all the various forms of software testing paradigms. The emerging evals in my mind will keep evolving and beyond testing our ideas of model intelligence and chat bot responses will do a lot more.
---
A more practical rant: If you are building a recommendation engine for A and B, the engine could have X amount of modules that return a score which when all combined make up the final decision between A and B. Forgive me but let's just use dating as an example. A product manager would say we need a new module to calculate relevance between A and B based off their food preferences. An agentic harness can easily code that module and create the tests for it. The product manager could ask an LLM to make a list of 1000 reasons why two people might be suitable for dating. The agent could easily go away and code and test all those modules and probably maintain technical consistency but drift from the companies philosophical business model. I am looking into building "semantic linting" for codebases, how can the agent maintain the code so it aligns with the company's business model. And if for whatever reason those 1000 modules need to be refactored, how can the agent maintain the code so it aligns with the company's business model. Essentially trying to make a feedback loop between the companies needs and the code itself. To stop the agent and the business from drifting in either directions, and allowing for automatic feedback loops for the agent to fix them. In short, I think there will be new tools invented that us human's will be mastering as to Karpathy's point.