There are literally thousands of retro emulators on github. What I was trying to do had zero examples on GitHub. My take away is obvious as of now. Some stuff is easy some not at all.
If however, your code foundations are good and highly consistent and never allow hacks, then the AI will maintain that clean style and it becomes shockingly good; in this case, the prompting barely even matters. The code foundation is everything.
But I understand why a lot of people are still having a poor experience. Most codebases are bad. They work (within very rigid constraints, in very specific environments) but they're unmaintainable and very difficult to extend; require hacks on top of hacks. Each new feature essentially requires a minor or major refactoring; requiring more and more scattered code changes as everything is interdependent (tight coupling, low cohesion). Productivity just grinds to a slow crawl and you need 100 engineers to do what previously could have been done with just 1. This is not a new effect. It's just much more obvious now with AI.
I've been saying this for years but I think too few engineers had actually built complex projects on their own to understand this effect. There's a parallel with building architecture; you are constrained by the foundation of the building. If you designed the foundation for a regular single storey house, you can't change your mind half-way through the construction process to build a 20-storey skyscraper. That said, if your foundation is good enough to support a 100 storey skyscraper, then you can build almost anything you want on top.
My perspective is if you want to empower people to vibe code, you need to give them really strong foundations to work on top of. There will still be limitations but they'll be able to go much further.
My experience is; the more planning and intelligence goes into the foundation, the less intelligence and planning is required for the actual construction.
Also re: "I spent longer arguing with the agent and recovering the file than I would have spent writing the test myself."
In my humble experience arguing with an LLM is a waste of time, and no-one should be spending time recovering files. Just do small changes one at a time, commit when you get something working, and discard your changes and try again if it doesn't.
I don't think AI is a panacea, it's just knowing when it's the right tool for the job and when it isn't.
I keep seeing this sentiment repeated in discussions around LLM coding, and I'm baffled by it.
For the kind of function that takes me a morning to research and write, it takes me probably 10 or 15 minutes to read and review. It's obviously easier to verify something is correct than come up with the correct thing in the first place.
And obviously, if it took longer to read code than to write it, teams would be spending the majority of their time in code review, but they don't.
So where is this idea coming from?
But that put aside, I don’t agree with the premise. It doesn’t make the hard parts harder, if you ACTUALLY spend half the time you’d have ORIGINALLY spent on the hard problem carefully building context and using smart prompting strategies. If you try and vibe code a hard problem in a one shot, you’re either gonna have a bad time straight away or you’re gonna have a bad time after you try and do subsequent prompting on the first codebase it spits out.
People are terrible observers of time. If you would’ve taken a week to build something, they try with AI for 2 hours and end up with a mess and claim either it’s not saving them any time or it’s making them code so bad it loses them time in the long run.
If instead they spent 8 hours slowly prompting bit by bit with loads of very specific requirements, technical specifications on exactly the code architecture it should follow with examples, build very slowly feature by feature, make it write tests and carefully add your own tests, observe it from the ground up and build a SOLID foundation, and spend day 2 slowly refining details and building features ONE BY ONE, you’d have the whole thing done in 2 days, and it’d be excellent quality.
But barely anyone does it this way. They vibe code it and complain that after 3 non specific prompts the ai wasn’t magically perfect.
After all these years of engineers complaining that their product manager or their boss is an idiot because they gave vague instructions and demanded it wasn’t perfect when they didn’t provide enough info, you’d think they’d be better at it given the chance. But no, in my experience coaching prompting, engineers are TERRIBLE at this. Even simple questions like “if I sent this prompt to you as an engineer, would you be able to do it based on the info here?” are things they don’t ask themselves.
Next time you use ai, imagine being the ai. Imagine trying to deliver the work based on the info you’ve been given. Imagine a boss that stamped their foot if it wasn’t perfect first try. Then, stop writing bad prompts.
Hard problems are easier with ai, if you treat hard problems with the respect they deserve. Almost no one does.
/rant
Ha! Yesterday an agent deleted the plan file after I told it to "forget about it" (as in, leave it alone).
Current LLM is best used to generate a string of text that's most statically likely to form a sentence together, so from user's perspective, it's most useful as an alternative to manual search engine to allow user to find quick answers to a simple question, such as "how much soda is needed for baking X unit of Y bread", or "how to print 'Hello World' in a 10 times in a loop in X programming language". Beyond this use case, the result can be unreliable, and this is something to be expected.
Sure, it can also generate long code and even an entire fine-looking project, but it generates it by following a statistical template, that's it.
That's why "the easy part" is easy because the easy problem you try to solve is likely already been solved by someone else on GitHub, so the template is already there. But the hard, domain-specific problem, is less likely to have a publicly-available solution.
Yes. Another way to describe it is the valuable part.
AI tools are great at delineating high and low value work.
I asked ChatGPT to guide how to install qBittorrent, Radarr (movies), Sonarr(TV Series), Jackett(credentials/login) without exposing my home IP and have a solid home cinema using private tracker only.
Everything had to be automated via Ansible using Proxmox "pct" CLI command, no copy and paste.
Everything had to run from a single Proxmox Debian container aka LXC
Everything network related had to use WireGuard via Proton VPN, if the VPN goes down, the container has zero network access, everything must be kill.
Everything had to be automated, download is finished, format the files structure for Jellyfin accordingly, Jellyfin add the new movies, TV shows.
It took me 3 nights to get everything up and running.
Many Ansible examples were either wrong or didn't follow what I asked to the letter, I had to fix it. I am not a network expert and hate Iptables haha, you need to know the basic of firewall to understand what the ACLs are doing to understand when it does not work. Then Proxmox folder mapping and you name it.
It would have taken me ages reading docs after docs to get things working, the "Arr services" is a black hole.
For this example, it made the harder part easier, I was not just copy/paste, it was providing the information I didn't know instead of me having to "Google for it".
I know the core of where things are running on, and here is where we have Engineers A and Engineers Z
Engineers A: I know what I am doing, I am using AI to make the boring part easier so I can have fun elsewhere
Engineers Z: I have no idea of what I am doing, I will just ask ChatGPT and we are done: 90-95% of engineers worldwide.
The hard part that becomes harder is not the technology. It’s the decision-making around it. When teams rush to integrate a model into core workflows without measuring outcomes or understanding user behavior, they end up with unpredictable results. For instance, we built an AI feature that looked great in demo, but in real usage it created confusion because users didn’t trust the auto-generated responses. The easy part (building it) was straightforward, but the hard part (framing it in a way people trusted and adopted) was surprisingly tough.
In real systems, success with AI comes not from the model itself, but from clear boundaries, human checkpoints, and real measurements of value over time.
Once the project crosses a couple of thousands of line of code, none of which you've written yourself, it becomes difficult to actually keep up what's happening. Even reviewing can become challenging since you get it all at once, and the LLM-esque coding style can at times be bloated and obnoxious.
I think in the end, with how things are right now, we're going to see the rise of disposable code and software. The models can churn out apps / software which will solve your specific problem, but that's about it. Probably a big risk to all the one-trick pony SaaS companies out there.
The article's easy/hard distinction is right but the ceiling for "hard" is too low. The actually hard thing AI enables isn't better timezone bug investigation LOL! It's working across disciplinary boundaries no single human can straddle.
The article's point about AI code being "someone else's code" hits different when you realize neither of you built the context. I've been measuring what actually happens inside AI coding sessions; over 60% of what the model sees is file contents and command output, stuff you never look at. Nobody did the work of understanding by building / designing it. You're reviewing code that nobody understood while writing it, and the model is doing the same.
This is why the evaluation problem is so problematic. You skipped building context to save time, but now you need that context to know if the output is any good. The investigation you didn't do upfront is exactly what you need to review the AI's work.
Like yea the AI won’t know what you discussed in last weeks meeting by default. But if you do auto transcribe to your meetings (even in person just open zoom on one persons laptop), save them to a shared place and have everyone make this accessible in their LLM’s context then it will know.
This is very much a hot take, but I believe that Claude Code and its yolo peers are an expensive party trick that gives people who aren't deep into this stuff an artificially negative impression of tools that can absolutely be used in a responsible, hugely productive way.
Seriously, every time I hear anecdotes about CC doing the sorts of things the author describes, I wonder why the hell anyone is expecting more than quick prototypes from an LLM running in a loop with no intervention from an experienced human developer.
Vibe coding is riding your bike really fast with your hands off the handles. It's sort of fun and feels a bit rebellious. But nobody who is really good at cycling is talking about how they've fully transitioned to riding without touching the handles, because that would be completely stupid.
We should feel the same way about vibe coding.
Meanwhile, if you load up Cursor and break your application development into bite sized chunks, and then work through those chunks in a sane order using as many Plan -> Agent -> Debug conversations with Opus 4.5 (Thinking) as needed, you too will obtain the mythical productivity multipliers you keep accusing us of hallucinating.
Someone mentioned it is a force multiplier I don't disagree with this, it is a force multiplier in the mundane and ordinary execution of tasks. Complex ones get harder and hard for it where humans visualize the final result where AI can't. It is predicting from input but it can't know the destination output if the destination isn't part of the input.
The first 3/4 of the article is "we must be responsible for every line of code in the application, so having the LLM write it is not helping".
The last 1/4 is "we had an urgent problem so we got the LLM to look at the code base and find the solution".
The situation we're moving to is that the LLM owns the code. We don't look at the code. We tell the LLM what is needed, and it writes the code. If there's a bug, we tell the LLM what the bug is, and the LLM fixes it. We're not responsible for every line of code in the application.
It's exactly the same as with a compiler. We don't look at the machine code that the compiler produces. We tell the compiler what we want, using a higher-level abstraction, and the compiler turns that into machine code. We trust compilers to do this error-free, because 50+ years of practice has proven to us that they do this error-free.
We're maybe ~1 year into coding agents. It's not surprising that we don't trust LLMs yet. But we will.
And it's going to be fascinating how this changes the Computer Science. We have interpreted languages because compilers got so good. Presumably we'll get to non-human-readable languages that only LLMs can use. And methods of defining systems to an LLM that are better than plain English.
If the easy stuff takes up 90% of the time, and the hard stuff 10%, then AI can be helpful. Personally, I can do "the easy stuff" with AI about 3-5x faster. So now I have a lot more free time for the hard stuff.
I don't let the AI near the hard stuff as it often gets confused and I don't save much time. I might still use it as a thought partner, but don't give it access to make changes.
Example: this morning I combined two codebases into one. I wrote both of them and had a good understanding of how everything worked. I had an opinion about some things I wanted to change while combining the two projects. I also had a strong opinion about how I wanted the two projects to interact with each other. I think it would have taken me about 2 workdays to get this done. Instead, with AI tooling, I got it done in 3 or so hours. I fired up another LLM to do the code review, and it found some stuff both I and the other LLM missed. This was valuable as a person developing things solo.
It freed up time for me to post on HN. :)
So I'm not sure this is a good rule of thumb. AI is better at doing some things than others, but the boundary is not that simple.
'AI makes everything easier, but it's a skill in itself, and learning that skill is just as hard as learning any other skill.'
For a more complete understand, you also have to add: 'we're in the ENIAC era of AI. The equivalents of high-level languages and operating systems haven't yet been invented.'
I have no doubt the next few years will birth a "context engineering" academic field, and everything we're doing currently will seem hopelessly primitive.
My mind changed on this after attempting complex projects—with the right structure, the capabilities appear unbounded in practice.
But, of course, there is baked-in mean reversion. Doing the most popular and uncomplicated things is obviously easier. That's just the nature of these models.
That is to say, just like every headline-grabbing programming "innovation" of the last thirty years.
Meta-circularity is the real test.
After all, I can make new humans :)
Tried to move some excel generation logic from epplus to closedxml library.
ClosedXml has basically the same API so the conversion was successful. Not a one-shot but relatively easy with a few manual edits.
But closedxml has no batch operations (like apply style to the entire column): the api is there but internal implementation is on cell after cell basis. So if you have 10k rows and 50 columns every style update is a slow operaton.
Naturally, told all about this to codex 5.3 max thinking level. The fucker still succumbed to range updates here and there.
Told it explicitly to make a style cache and reuse styles on cells on same y axis.
5-6 attempts — fucker still tried ranges here and there. Because that is what is usually done.
Not here yet. Maybe in a year. Maybe never.
Needless to say, he was wrong and gently corrected over the course of time. In his defense, his use cases for LLMs at the time were summarizing emails in his email client.. so..eh.. not exactly much to draw realistic experience from.
I hate to say it, but maybe nvidia CEO is actually right for once. We have a 'new smart' coming to our world. The type of a person that can move between worlds of coding, management, projects and CEOing with relative ease and translate between those worlds.
Sorry but this is the whole point of software engineering in a company. The aim is to deliver value to customers at a consistent pace.
If a team cannot manage their own burnout or expectations with their stakeholders then this is a weak team.
It has nothing to do with using ai to make you go faster. Ai does not cause this at all.
A lot of people are lying to themselves. Programming is in the middle of a structural shift, and anyone whose job is to write software is exposed to it. If your self-worth is tied to being good at this, the instinct to minimize what’s happening is understandable. It’s still denial.
The systems improve month to month. That’s observable. Most of the skepticism I see comes from shallow exposure, old models, or secondhand opinions. If your mental model is based on where things were a year ago, you’re arguing with a version that no longer exists.
This isn’t a hype wave. I’m a software engineer. I care about rigor, about taste, about the things engineers like to believe distinguish serious work. I don’t gain from this shift. If anything, it erodes the value of skills I spent years building. That doesn’t change the outcome.
The evidence isn’t online chatter. It’s sitting down and doing the work. Entire applications can be produced this way now. The role changes whether people are ready to admit it or not. Debating the reality of it at this point mostly signals distance from the practice itself.
Imagine if every function you see starts checking for null params. You ask yourself: "when can this be null", right ? So it complicates your mental model about data flow to the point that you lose track of what's actually real in your system. And once you lose track of that it is impossible to reason about your system.
For me AI has replaced searching on stack overflow, google and the 50+ github tabs in my browser. And it's able to answer questions about why some things don't work in the context of my code. Massive win! I am moving much faster because I no longer have to switch context between a browser and my code.
My personal belief is that the people who can harness the power of AI to synthesize loads of information and keep polishing their engineering skills will be the ones who are going to land on their feet after this storm is over. At the end of the day AI is just another tool for us engineers to improve our productivity and if you think about what being an engineer looked like before AI even existed, more than 50% of our time was sifting through google search results, stack overflow, github issues and other people's code. That's now gone and in your IDE, in natural language with code snippets adapted to your specific needs.
The very first example of deleting 400+ lines from a test file. Sure, I've seen those types of mistakes from time-to-time but the vast majority of my experience is so far different from that, I don’t even know what to make of it.
I’m sure some people have that experience some of the time, but… that’s just not been my experience at all.
Source: Use AI across 7+ unrelated codebases daily for both personal and professional work.
No, it’s not a panacea, but we’re at the stage that when I find myself arguing with AI about whether a file existed; I’m usually wrong.
What I found to be useful for complex tasks is to use it as a tool to explore that highly-dimensional space that lies behind the task being solved. It rarely can be described as giving a prompt and coming back for a result. For me it's usually about having winding conversations, writing lists of invariants and partial designs and feeding them back in a loop. Hallucinations and mistakes become a signal that shows whether my understanding of the problem does or does not fit.
Gemini in Antigravity today is pretty interesting, to the point where it's worth experimenting with vague prompts just to see what it comes up with.
Coding agents are not going to just change coding. They make a lot of detailed product management work obsolete and smaller team sizes will make it imperative to reread the agile manifesto and and discard scrum dogma.