FRESH

Hacker News

Home

Your job is to deliver code you have proven to work

852 points by simonw

by endorphine

24 subcomments

> there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.
It's even worse than that: non-junior devs are doing it as well.

by robgibbons

10 subcomments

For what it's worth, writing good PRs applies in more cases than just AI generated contributions. In my PR descriptions, I usually start by describing how things currently work, then a summary of what needs to change, and why. Then I go on to describe what exactly is changing with the PR. This high level summary serves to educate the reviewer, and acts as a historical record in the git log for the benefit of those who come after you.
From there, I include explicit steps for how to test, including manual testing, and unit test/E2E test commands. If it's something visual, I try to include at least a screenshot, or sometimes even a brief screen capture demonstrating the feature.
Really go out of your way to make the reviewer's life easier. One benefit of doing all of this is that in most cases, the reviewer won't need to reach out to ask simple questions. This also helps to enable more asynchronous workflows, or distributed teams in different time zones.

by vladsh

3 subcomments

We should get back to the basic definition of the engineering job. An engineer understands requirements, translates them into logical flows that can be automated, communicates tradeoffs across the organization, and makes tradeoff calls on maintainability, extensibility, readability, and security. Most importantly, they’re accountable for the outcome, because many tradeoffs only reveal their cost once they hit reality
None of this is covered by code generation, nor by juniors submitting random PRs. Those are symptoms of juniors (not only) missing fundamentals. When we forget what the job actually is, we create misalignment with junior engineers and end up with weird ideas like "spec-driven development"
If anything, coding agents are a wake-up call that clarify what engineering profession is really about

by layer8

10 subcomments

I’d go further and say while testing is necessary, it is not sufficient. You have to understand the code and convince yourself that it is logically correct under all relevant circumstances, by reasoning over the code.
Testing only “proves” correctness for the specific state, environment, configuration, and inputs the code was tested with. In practice that only tests a tiny portion of possible circumstances, and omits all kinds of edge and non-edge cases.

by dfxm12

25 subcomments

there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.
Is anyone else seeing this in their orgs? I'm not...

by JoeAltmaier

2 subcomments

The job, in the modern world, is to close tickets. The code quality is negotiable, because the entire automated software process doesn't measure code quality, just statistics.
That's why I refuse to take part in it. But I'm an old-world craftsman by now, and I understand nobody wants to pay for working, well-thought-out code any more. They don't want a Chesterfield; they want plywood and glue.

by 0xbadcafebee

1 subcomments

Actually it's more specific than that. A company pays you not just to "write code", not just to "write code that works", but to write code that works in the real world. Not on your laptop. Not in CI tests. Not on some staging environment. But in the real world. It may work fine in a theoretical environment, but deflate like a popped balloon in production. This code has no value to the business; they don't pay you to ship popped balloons.
Therefore you must verify it works as intended in the real world. This means not shipping code and hoping for the best, but checking that it actually does the right thing in production. And on top of that, you have to verify that it hasn't caused a regression in something else in production.
You could try to do that with tests, but tests aren't always feasible. Therefore it's important to design fail-safes into your code that ALERT YOU to unexpected or erroneous conditions. It needs to do more than just log an error to some logging system you never check - you must actually be notified of it, and you should consider it a flaw in your work, like a defective pair of Nikes on an assembly line. Some kind of plumbing must exist to take these error logs (or metrics, traces, whatever) and send it to you. Otherwise you end up producing a defective product, but never know it, because there's nothing in place to tell you its flaws.
Every single day I run into somebody's broken webapp or mobile app. Not only do the authors have no idea (either because they aren't notified of the errors, or don't care about them), there is no way for me to even e-mail the devs to tell them. I try to go through customer support, a chat agent, anything, and even they don't have a way to send in bug reports. They've insulated themselves from the knowledge of their own failures.

by trevor-e

7 subcomments

> Your job is to deliver code you have proven to work.
Strong disagree here, your job is to deliver solutions that help the business solve a problem. In _most_ cases that means delivering code that you should be able to confidently prove satisfies the requirements like the OP mentioned, but I think this is an important nitpick distinction I didn't understand until later on in my career.

by andy99

1 subcomments

I think the problem is in what “proven” means. People that don’t know any better will just do that all with LLMs and still deliver the giant untested PRs but with some LLM written tests attached.
I vibe code a lot of stuff for myself, mostly for viewing data, when I don’t really need to care how it works. I’m coming around to the idea that outside of some specific circumstances where everyone has agreed they don’t need to care about or understand the code, team vibe coding is a bad practice.
If I’m paying an engineer, it’s for their work, unless explicitly agreed otherwise.
I think vibe coding is soon going to be seen the same way as “research” where you engage an offshore team (common e.g. in consulting) to give you a rundown on some topic and get back the first five google search results. Everyone knows how to do that, if it’s what they wanted they wouldn’t be hiring someone to do it.

by mapontosevenths

0 subcomment

I agree with this, except it glosses over security. Your job is to deliver SECURE code that you have proven to work.
Manual and automatic testing are still both required, but you must explicitly ensure that security considerations are included in those tests.
The LLM doesn't care. Caring is YOUR job.

by acituan

0 subcomment

First problem is turning engineers into accountability sinks. This was a problem before LLMs too, but now a much bigger and structural problem with democratization of the capacity to produce plausible looking dumb code. You will be forced to underwrite more and more of that, and expected to absorb the downsides.
The root cause is the second problem; short of formal verification you can never exhaustively prove that your code works. You can demonstrate and automate that demonstration for a sensible subset of inputs and states and hope for the state of the world approximately staying that way (spoiler: it won't). This is why 100% test coverage in most cases is something bad. This is why sensible is the key operative attitude, which LLM suck at right now.
The root cause of that one is the third problem; your job is to solve a business problem. If your code is not helping the business problem, it actually is not working in the literal sense of the work. It is an artifact that does a thing, but it is not doing work. And since you're downstream of all the self-contradicting, ever changing requirements in a biased framing of a chaotic world, you can never prove or demonstrate that your code solves a business problem and that is the end state.

by gaigalas

0 subcomment

> Make your coding agent prove it first
Agents love to cheat. That's an issue I don't see a horizon for change.
Here's Opus 4.5 trying to cheat its way out of properly implementing compatibility and cross-platform, despite the clear requirements:
https://gist.github.com/alganet/8531b935f53d842db98157e1b8c0...
> Should popen handles work with fgets/fread/fwrite? PHP supports this. Option A: Create a minimal pipe_io_stream device / Option B: Store FILE* in io_private with a flag / Option C: Only support pclose, require explicit stream wrapper for reads.
If I asked for compatibility, why give me options that won't fully achieve it?
It actually tried to "break check" my knowledge about the interpreter (test me if I knew enough to catch it), and proposed shortcuts all the way through the chat.
I don't want to have to pepper my chats with variations on "don't cheat". I mean, I can do it, but it seems like boilerplate.
I wish I had some similar testing-related chats to share. Agents do that all the time.
This is the major blocker right now for AI-assisted automated verification, and one of the reasons why this isn't well developed beyond general directions (give it screenshots, make it run the command, etc).

by doganugurlu

0 subcomment

I don't test my code because I think it's my duty. I test it because my personal motivation is to see it working! What's the point of writing code if I don't even get to see it run?!
If someone's not even interested and excited to see their code work, they are in the wrong profession.

by Swannie

0 subcomment

Posted down thread, but worth posting as a comment too.
I know Simon follows this "Issue First" style of work in his projects, with a strong requirement for passing tests to be included.
It's been a best practice for a long time. I really enjoyed this when I read it ~10 years ago, and it still stands the test of time:
https://rfc.zeromq.org/spec/42/#24-development-process
The rationale was articulated clearly in:
https://hintjens.gitbooks.io/social-architecture/content/cha...
If you have time, do yourself a favour and read the whole lot. And then liberally copy parts of C4 into your own process. I have advocated for many components of it, in many contexts, at $employer, and will continue to do so.

by agentultra

1 subcomments

There’s an anecdote from one of Djikstra’s essays that strikes at the heart of this phenomenon. I’ll paraphrase because I can’t remember the exact edw number off the top of my head.
A colleague was working on an important subsystem and would ask Djikstra for a review when he thought it was ready. Djikstra would have to stop what he was doing, analyze the code, and would find a grievous error or edge case. He would point it out to the colleague who would then get back to work. The colleague would submit his code for review again and this could carry on enough times that Djikstra got annoyed.
Djikstra proposed a solution. His colleague would have to submit with his code some form of proof or argument as to why it was correct and ready to merge. That way Djikstra could save time by only having to review the argument and not all of the code.
There’s a way of looking at LLM output as Djikstra’s colleague. It puts a lot of burden on the human using this tool to review all of the code. I like Doctorow’s mental model of a reverse centaur. The LLM cannot reason and so won’t provide you with a sound argument. It can probably tell you what it did and summarize the code changes it made… but it can’t decide to merge those changes. It needs a human, the bottom half of the centaur, to do the last bit of work here. Because that’s all we’re doing when we let these tools do most of the work for us: we’re here to take the blame.
And all it takes is an implementation of what we’re trying to build already, every open source library ever, all of SO, a GW of power from a methane power plant, an Olympic pool of water and all of your time reviewing the code it generates.
At the end of the day it’s on you to prove why your changes and contributions should be merged. That’s a lot of work! But there’s no shortcuts. Luckily you can reason while the LLMs struggle with that so use it while you can when choosing to use such tools.

by enraged_camel

2 subcomments

>> As software engineers we don’t just crank out code—in fact these days you could argue that’s what the LLMs are for. We need to deliver code that works—and we need to include proof that it works as well.
I would go a step further: we need to deliver code that belongs. This means following existing patterns and conventions in the codebase. Without explicit instruction, LLMs are really bad at this, and it's one of the things that make it incredibly obvious to reviews that a given piece of code has been generated by AI.

by dangus

0 subcomment

Your job isn’t to deliver code that works, it’s to successfully[1] operationalize business logic.
[1] I.e., it should work
That may seem pedantic but that’s a huge difference. Code is a means to an end. If no-code suddenly became better than code through some miracle, that would be your job.
This also means that if one day AI stops making mistakes, tossing AI requests over the wall may be a legitimate modus operandi.

by 0x500x79

0 subcomment

I think there are two other things missing: Security and Maintainability. Working code that can never be touched again by a developer or requires an excessive amount of time to maintain is not part of a developers job either.
Overall, this hits the nail on the head about not delivering broken code and providing automated tests. Thanks for putting your thoughts on paper.

by nzoschke

1 subcomments

Having the coding agent make screenshots is a big power up.
I’m experimenting with how to get these into a PR, and the “gh” CLI tool is helpful.
Does anyone have a recipe to get a coding agent to record video of webflows?

by toobulkeh

0 subcomment

Your job is also, arguably more important in certain scenarios, is to deliver maintainable code.
Remember, code does 2 things: 1. Tell the machine what to do 2. Tell the next developer what you were trying to do

by ChrisMarshallNY

1 subcomments

In my last job (engineering manager for a Japanese high-Quality hardware manufacturer), we were expected to deliver software that works.
In fact, if any bugs were found by the official "last step" QA Department, we (as a software development department) were dinged. If QA found bugs, they could stop the entire product release, so you did not want to be responsible for that.
This resulted in each software development department setting up their own, internal "QC team" of testers. If they found bugs, then individual programmers (or teams) would get dinged, but the main department would not.
Our software got a lot of testing.

by johnea

0 subcomment

I couldn't agree more with the sentiment.
If you, the development engineer, haven't demonstrated the product to work as expected, and preferably this testing is independently confirmed by a product test group, then you can't claim to be delivering a functional product.
I would add though, that management, specifically marketing management setting unreasonable demands and deadlines, are a bigger threat to testing than LLMs.
Of course the damage done by LLM generated code not being tested, is additive to the damage management is doing.
So this isn't any kind of apologism, the two sources are both making the problem worse.

by allcentury

1 subcomments

Manual testing as the first step… not very productive imo.
Outside in testing is great but I typically do automated outside in testing and only manual at the end. The loop process of testing needs to be repeatable and fast, manual is too slow

by vcarrico

0 subcomment

> the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.
I'm noticing something else very similar but involving not necessarily junior roles with long messages, when they use these AI writing assistants that resume stuff, creates follow-ups, etc. Putting this additional burden in whoever needs to read it. It makes me think of a quote that says: "I would have written a shorter letter, but I didn't have the time."

by onion2k

1 subcomments

I want to distill this post into some sort of liquid I can inject directly into my dev teams. It's absolutely spot on. Seeing a PR with a change that doesn't build is one of the most disappointing things.

by tete

0 subcomment

Who is "you"?
It's not my job, really. And given by the state of IT these days it's barely anyone's it seems.

by freedomben

0 subcomment

> Don’t be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I’ve done this myself I’ve quickly regretted it.
Seriously, this cannot be emphasized enough. Before LLMS when we were writing tests completely manually, manual testing made sense to me as the second step. However after playing around a lot with coding agents and LLMs, I fully agree this has flipped. Test it manually first! When you generate the tests it is extremely wise to ensure that the tests fail without the new code, and pass with it. You definitely need to review the test though, because it's remarkably easy to have the agent put something in there that makes it not a good test.
Just a couple days ago for example, Claude made a test pass by skipping authentication and leaving a brief comment informing that the authentication made the test flaky. It even threw a quick variable in there that enabled running or disabling flaky tests, and flaky tests were disabled by default! Had I not been doing a good review, I definitely would have missed it because it was cleverly subtle. I've also seen it test the wrong endpoint!

by cyrialize

0 subcomment

When I start working on a ticket I actually start writing up a PR early on.
As I figure out my manual testing, I'll write out the steps that I took in my PR.
I've found that writing it out as I go does two things: 1) It makes it easier to have a detailed PR and 2) it acts as a form of rubber-ducking. As I'm updating my PR I'll realize steps I've missed in my testing.
Something that also helped out with my manual testing skill was working in a place that had ZERO automated testing. Every PR required a detailed testing plan that you did and that your reviewer could re-create.

by rmnclmnt

2 subcomments

> As software engineers we…
That’s the thing. People exposing such rude behavior usually are not, or haven’t been in a looong time…
As for the local testing part not being performed, this is a slippery slope I’m fighting everyday: more and more cloud based services and platforms are used to deploy software to run with specific shenanigans and running it locally requires some kind of deep craft and understanding. Vendor lock-in is coming back in style (e.g. Databricks)

by keeda

0 subcomment

The way I would phrase it is: software engineering is the craft of delivering the right code at the right time, where "right code" means it can be trusted to do the "right thing."
A bit clunky, but I think that can be scaled from individual lines of code to features or entire systems, whatever you are responsible for delivering, and encompasses all the processes that go around figuring what code is to be actually written and making sure it does what it's supposed to.
Trust and accountability are absolutely a critical aspect of software engineering and the code we deliver. Somehow that is missed in all the discussions around AI-based coding.
The whole phenomenon of AI "workslop" is not a problem with AI, it's a problem with lack of accountability. Ironically, blaming workslop on AI rather than organizational dysfunction is yet another instance of shirking accountability!

by visarga

3 subcomments

I agree with the author overall. Manual testing is what I call "vibe testing" and I think by itself is insufficient, no matter if you or the agent wrote the code. If you build your tests well, using the coding agent becomes smooth and efficient, and the agent is safe to do longer stretches of work. If you don't do testing, the whole thing is just a bomb ticking in your face.
My approach to coding agents is to prepare a spec at the start, as complete as possible, and develop a beefy battery of tests as we make progress. Yesterday there was a story "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours". They had 9000+ tests. That was the secret juice.
So the future of AI coding as I see it ... it will be better than pre-2020, we will learn to spec and plan good tests, and the tests are actually our contract the code does what is supposed to do. You can throw away the code and keep the specs and tests and regenerate any time.

by holtkam2

0 subcomment

I’d go further: it’s not enough to be able to prove that your code works. It’s required that you also understand why it works.
Otherwise you’ll end up in situations where it passes all test cases yet fails for something unexpected in the real world, and you don’t know why, because you don’t even know what’s going on under the hood.

by ozim

0 subcomment

There is whole range of “proven to work” - regarding testing you cannot prove that there are no bugs.
Your job is to the deliver code up to specifications.
Not even checking the happy flow at least is of course gross negligence. But so is spending too much time on edge cases that no one will run into or person asking doesn’t want to pay for covering.

by llm_nerd

0 subcomment

"the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest."
Kind of depressing how it has become such a trope of blaming juniors for every ill or bad habit. In all likelihood the reader of this comment has a number of terrible habits, working on teams with terrible habits, and juniors play zero part in it.
And, I mean, on that theme developers have been doing this for as long as we've had large teams. I've worked at a large number of teams where there was the fundamental principal that QA / UA holds responsibility. That they are responsible for tests, and they are responsible for bad code making it through to the product / solution. Developers -- grizzled, excellent-CV devs -- would toss over garbage code and call it a day.

by casey2

0 subcomment

It comes out of the AI, that is proof enough. Why would I have prompted it and gave it to you if I didn't think that the AI could handle it? The real risk is closer to "people carry some preconceived notion about code that doesn't map to AI code." such as, for example, the person who contributed the code knows about the problem in enough detail to be accountable in the short term. Or at the very least be able to tell you why they made a PR at all
How to prove it has been subject to some debate for the past century, the answer is that it's context dependent to what degree you will or even can prove the program and exposed identifiers correct. Programming is a communication problem as well as a math problem, often an engineering problem too. Only the math portion can be proved, the a small by critical amount engineering portion tested.
Communication is the most important for velocity it's the difference between hand rolling machine code and sshing into a computer halfway across the world having every tool you expect. If you don't trust that webdevs know what they are doing then you can be the most amazing dev in the world you but your actual ability to contribute will be hampered. The same is true of vibe coding, if people aren't on the same page as to what is and isn't acceptable velocity starts to slow down.
Languages have not caught up to AI tools, since AI operates well above the function level, what level would be appropriate to be named and signed off on? pull request and link to the chat as a commit? (what is wrong with that that could be fixed at the name level)
Honest communication is the most important. Amazon telling investors that they use TLA+ is just signaling that they "for realz take uptime very seriously guize", "we know distributed systems" and engineering culture. The honest reality is that they could prove all their code and not IMprove their uptime one lick, because most of what they run isn't their code. It's a communication breakdown if effort gets spent on that outside a research department.

by WhyOhWhyQ

2 subcomments

Isn't this in contradiction to your blog post from yesterday though? It's impossible to prove a complex project made in 4.5 hours works. It might have passed 9000 tests, but surely there are always going to be edge cases. I personally wouldn't be comfortable claiming I've proved it works and saying the job is done even, if the LLM did the whole thing and all existing tests passed, until I played with it for several months. And even then I would assume I would need to rely on bug reports coming in because it's running on lots of different systems. I honestly don't know if software is ever really finished.
My takeaway from your blog post yesterday was that with a robust enough testing system the LLM can do the entire thing while I do Christmas with the family.
(Before all the AI fans come in here. I'm not criticizing AI.)

by zhyder

0 subcomment

"Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work."
I'd go further: what's valuable is code review. So review the AI agent's code yourself first, ensuring not only that it's proven to work, but also that it's good quality (across various dimensions but most importantly in maintainability in future). If you're already overwhelmed by that thousand-line patch, try to create a hundred-line patch that accomplishes the same task.
I expect code review tools to also rapidly change, as lines of code written per person dramatically increase. Any good new tools already?

by a24venka

0 subcomment

There is a heavy emphasis on testing the code as the way to provide guarantees that it works. While this is a helpful tool, I often find that the best engineers are ones who take a more first principles approach to the code and can reason about why the solution is comprehensive (covers all edge cases) and clean (easy for humans and LLMs to build on).
It often takes discipline to think and completely map out solutions before you build. This is where experience and knowing common patterns can also help.
When you have the experience of having manually written or read a lot of code it helps at the very least quickly understand what the LLMs are writing and reason about it later even if not at the beginning.

by imiric

0 subcomment

The job of a software developer is not just to prove that the software "works". The definition of "works" itself is often fuzzily defined and difficult to prove.
That is part of it, yes, but there are many others, such as ensuring that the new code is easy to understand and maintain by humans, makes the right tradeoffs, is reasonably efficient and secure, doesn't introduce a lot of technical debt, and so on.
These are things that LLMs often don't get right, and junior engineers need guidance with and mentoring from more experienced engineers to properly learn. Otherwise software that "works" today, will be much more difficult to make "work" tomorrow.

0 subcomment

by am17an

0 subcomment

Well a 1000 line PR is still not welcome. It puts too much of a burden on the maintainers. Small PRs are the way to go, tests are great too. If you have to submit a big PR, get buy in from a maintainer first that they will review your code.

by weatherlite

1 subcomments

> Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work.
That's really not a great development for us. If our main point is now reduced to accountability over the result with barely any involvement in the implementation - that's very little moat and doesn't command a high salary. Either we provide real value or we don't ...and from that essay I think it's not totally clear what the value is - it seems like every QA, junior SWE or even product manager can now do the job of prompting and checking the output.

by sowbug

0 subcomment

Since they’re robots, automated tests and manual tests are effectively the same thing.
I'd buttress this statement with a nuance. Automated tests typically run in their entirety, usually by a well-known command like cargo test or at least by the CI tools. Manual tests are often skipped because the test seems to be far away from the code being changed.
My all-time favorite team had a rule that your code didn't exist if it didn't have automated tests to "defend" it. If it didn't, it was OK, or at least not surprising, for someone else to break or refactor it out of existence (not maliciously, of course).

by geldedus

0 subcomment

Not only to work, but to not make the life of those coders who come after you a hell.

by rldjbpin

0 subcomment

i don't think this is as much as an AI issue as it is about path of least resistance in a velocity-driven environment.
call me the worst junior dev in the industry, but pre-coding agents, closing tickets was more important than upholding absolute quality. not everybody is dealing with a billion concurrent users with multi-geo deployments. most of the time, a few screenshots or test output for manual validation is enough to go ahead. when pressed with time and without the prerequisites in the infra side, doing the absolute best development and testing is a luxury only for daydreamers.
automated testing can be a double-edged sword. pre-LLM, even test coverage was a number that somehow needed to go up after each PR. this only resulted in shady tactics of pointless test cases that slowly bring up the metric. today it can be very dangerous if both code and its test suite are vibe coded. especially when it can give the appearance of that 90%+ code coverage.
on the other hand, some manual testing to make sure the core functionality works is the bare minimum one does before pushing out code. at least i would like to believe it is.

by golly_ned

0 subcomment

I don't think this quite captures the problem: even if the code is functional and proven to work, it can still be bad in many other ways.
The submitter should understand how it works and be able to 'own' and review modifications to it. That's cognitive work submitters ipso facto don't do by offloading the understanding to an LLM. That's the actual hard work reviewers and future programmers have to do instead.

by dekhn

1 subcomments

Prove is a strong word. There are few cases in real-world programming where you can prove anything.
I prefer to make this probabilistic: use testing to reduce the probability that your code isn't correct, for the situations in which it is expected to be deployed. In this sense, coding and testing is much like doing experimental physics: we never really prove a theory or disprove it, we just invalidate clearly wrong ones.

by kords

0 subcomment

I agree that tests and automation are probably the best things we can do to validate our code and author of the PR should be more responsible. However they can't prove that the code works. It's almost the opposite: If they pass and it's a good coverage, then code has better chances to work. If they fail, then they prove code doesn't work.

by SunshineTheCat

5 subcomments

I know this won't be popular, however, I think the idea of differentiating a "real developer" from one who relies mostly, or even solely on an LLM is coming to an end. Right now, I fully agree relying wholly upon an LLM and failing to test it is very irresponsible.
LLMs do make mistakes. They do a sloppy job at times.
But give it a year. Two years. five years. It seems unreasonable to assume they will hit a plateau that will prevent them from being able to build, test, and ship code better than any human on earth.
I say this because it's already happened.
It was thought impossible for a computer to reach the point of being able to beat a grandmaster at chess.
There was too much "art," experience, and nuance to the game that a computer could ever fully grasp or understand. Sure there was the "math" of it all, but it lacked the human intuition that many thought were essential to winning and could only be achieved through a lifetime of practice.
Many years following Deep Blue vs. Garry Kasparov, the best players in the world laugh at the idea of even getting close to beating Stockfish or any other even mediocre game engine.
I say all of this as a 15-year developer. This happens over and over again throughout history. Something comes along to disrupt an industry or profession and people scream about how dangerous or bad it is, but it never matters in the end. Technology is undefeated.

by softwaredoug

0 subcomment

A lot of AI coding changes coding to more of a declarative practice.
Claude, etc, works best with good tests that verify the system works. And so the code becomes in some ways the tests rather than the code that does the thing. If you're responsible for the thing, then 90% of your responsibility moves to verifying behavior and giving agents feedback.

by andai

2 subcomments

> Don’t be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I’ve done this myself I’ve quickly regretted it.
How does this work? When expectations about the program's state vs its observable behavior diverge?

by nottorp

1 subcomments

Hmm. I've never been asked to do formal proofs for my code. Where does he work?

by funkattack

0 subcomment

Non-native speaker here. I’ve always loved that we say “commit” not “upload” or “save”.

by maerF0x0

3 subcomments

> The first is manual testing. If you haven’t seen the code do the right thing yourself, that code doesn’t work. If it does turn out to work, that’s honestly just pure chance.
Depending on exactly what the author meant here, I disagree. Our first and default tool should be some form of lightweight automated testing. It's explicit (serves a form of spec and docs how to use the software), it's repeatable (manual testing is done once and it's result is invalidated moments later), and it's cost per minute of effort is more or less the same (most companies have the engineers do the tests, they are expensive).
Yes. There will be exceptions and exceptional cases. This author is not talking about exceptions and neither am I. They're not an interesting addition to this conversation.

by naasking

0 subcomment

Almost nobody proves their code works. At best, they simply have high confidence it works. This confidence is also sometimes (often?) misplaced.

by psv2522

0 subcomment

I thought manual testing was mandatory mininum requirement after every AI change unless its very small typo or something?
How is this a issue, its genuinely common sense.

0 subcomment

by jstrebel

0 subcomment

In all fairness: human senior devs see AI-written source code with some disdain, as it usually does not match their stylistic and idiomatic preferences (although being correct and fully working). I don't think that untested code is the problem here - you can easily measure test coverage and of course. every CI/CD pipeline should run the existing unit and integration tests.

by newsoftheday

0 subcomment

"Your job is to deliver code you have proven to work"
And...code that has been 100% reviewed, even if it was fully LLM generated.

by DustinBrett

0 subcomment

Proving it works in edge cases is usually the hard part.

by yuedongze

0 subcomment

it's very similar to the verification engineering problem i wrote about on HN last week. AI is as good as we can prove their work is genuine. and we need humans in the loop to fill in the gaps between autonomous systems and ultimately be held accountable by human laws. it's kind of sad but the reality we are facing

by koinedad

0 subcomment

This is very helpful for a team and even though it takes a little time it actually speeds things up in the long run. Using PR templates can help. A general description of the problem including a screenshot or video go a long way.
I remember when I was working at a startup and a new engineer merged his code and it totally broke the service. I asked him if he ran his code locally first and he stared at me speechless.
Running the code locally is the easiest way to eliminate a whole series of silly bugs.
Like mentioned in the article adding a test and then reverting your change to make sure the test fails is really important, especially with LLMs writing tests. They are great at making things look like they work but completely don’t.

by ianberdin

0 subcomment

The solution is easy: responsibility.
The point is to hire people who can own code and codebase. “Someone will review” is dead end.

by rglover

1 subcomments

"Slow the f*ck down." - Oliver Reichenstein [1]
This only happens because the software industry has fallen into the Religion of Speed. I see it constantly: justified corner-cutting, rushing shit out the door, and always loading up another feature/project/whatever with absolutely zero self-awareness. AI is just an amplifier for bad behavior that was already causing chaos.
What's not being said here but should be: discipline matters. It's part of being a professional and always precedes someone who can ship code that "just works."
[1] https://ia.net/*

0 subcomment

by gorjusborg

0 subcomment

Your actual job is to produce positive outcomes for your stakeholders. Code can be part of that, but doesn't have to be.
If you are dumping AI slop on your team to sort through, you are creating drag on the entire team's efforts toward those positive outcomes.
As someone getting dumped upon, you probably should make the decision (in line with the objective to producing positive outcomes) to not waste your time weeding through that stuff.
Review everything else, make it clear that the mess is not reviewable, and communicate that upward if needed.

by acrophiliac

1 subcomments

Perhaps off-topic, but: "Testing doesn't show the absence of errors, it shows the presence of errors" Willison says we need to submit code we have proven to work but then argues for empirical testing, not actual correctness proofs.

by givemeethekeys

0 subcomment

Sorry that’s not what it says in my job description.

by webprofusion

0 subcomment

No my job is sitting eating this here donuts.

by mellosouls

2 subcomments

Thing is, this has always been the case. One of the problems with LLM-assisted coding is the idea that just because we're in a new era (we certainly are), the old rules can all be discarded.
The title doesn't go far enough - slop (AI or otherwise) can work and pass all the tests, and still be slop.

by lifeisstillgood

0 subcomment

I’m going to go with this as probably in the top three definitions of software developer …
along with
- the job was better titled as “Analyst Programmer” - you need both.
And
- you can make a changeset, but you have to also sell the change

by unsungNovelty

0 subcomment

Billions were spent in the last 5+ years saying AI can do coding. Increase speed. Reduce headcount. Remove processes. It's irritating to see that people are NOW changing the narrative of... You need to review the code... You need test it...
Devs already know this. Tell this to Managers, CEOs and non-engineers who believed billions worth of marketing BS. Cos devs don't have voice most of the time. They set the timelines. The want to push this end results to their team/company. So that is the constraints devs are working with. So to them, NOT to us Simon. WE KNOW! :)

by t1234s

0 subcomment

Bravo.. best headline I've read in a long time. This phrase should be a desktop background.

0 subcomment

by 6510

0 subcomment

I work alone, I have considerable amount of unfinished code laying around. Sometimes even multiple instances of a thing. I could see how it would be annoying in a team settings. The cause is not having the thing but how you organize it. Like with LLM slop it is wonderful to be able to scroll over something that shows what the solution might look like.

by nrhrjrjrjtntbt

0 subcomment

Always has been

by nish__

0 subcomment

Good framing.

by morning-coffee

0 subcomment

Amen

by emsign

0 subcomment

As if! :)

by annjose

0 subcomment

I came here to say
1) Amen 2) I wonder if this is isolated to junior dev only? Perhaps it seems like that because junior devs do more AI assisted coding than seniors?

by ekjhgkejhgk

0 subcomment

Oh look another "an opinionated X". Everything is opinionated these days, even opinions.

0 subcomment

by fjfaase

1 subcomments

One more reason to work without branches and PRs. The future for CI/CD is bright ;-).

by venturecruelty

0 subcomment

Lmao no, my job is to make the line go up and make my boss happy. It was ever thus.

by throwaway2027

0 subcomment

It works on my machine ¯\_(ツ)_/¯

by nolineshere

0 subcomment

[dead]

by sapphirebreeze

0 subcomment

[dead]

by TheSamFischer

0 subcomment

[dead]

by ekjhgkejhgk

0 subcomment

[flagged]

by koakuma-chan

0 subcomment

[flagged]

by alexgotoi

2 subcomments

[flagged]

by daedrdev

0 subcomment

Maybe in an ideal world

by webdev1234568

4 subcomments

Whole article seems very much all llm generated
Edit: I'm an idiot ignore me.

by 9rx

1 subcomments

> Your job is to deliver code you have proven to work.
Your job is to solve customer problems. Their problems may only be solvable with code that is proven to work, but it is equally likely (I dare say even more likely) that their problem isn't best solved with code at all, or even solved with code that doesn't work properly but works well enough.

by zkmon

1 subcomments

How about letting LLMs maintain a vast number of product versions all available at the same, which receive multiple versions of untested versions of the same patch, from LLMs, and then let the models elect a version of the software based on probabilistic or gradient methods? This elected version could change for different assessments. No human touches or looks at the code!
Just a wild thought, nothing serious.

by Rperry2174

10 subcomments

Im not fully convinced by "a computer can never be held accountable"
We already delegate accountability to non-humans all the time: - CI systems block merges - monitoring systems page people - test suites gate different things
In practice accountability is enforced by systems, not humans.. humans are defintiely "blamed" after the fact, but the day-to-day control loop is automated.
As agents get better at running code, inspecting ui state, correlating logs, screenshots, etc they're starting to operationally be "accountable" and preventing bad changes from shipping and producing evidence when something goes wrong .
At some point humans role shifts from "i personally verify this works" to "i trust this verification system and am accountable for configuring it correctly".
Thats still responsibility, but kind of different from whats described here. Taken to a logical extreme, the arguement here would suggest that CI shouldn't replace manual release checklists

by bluesnowmonkey

0 subcomment

> Your job is to deliver code you have proven to work.
First of all, no it’s not. Your job is to help the company succeed. If you write code that works but doesn’t help the company succeed, you failed. People do this all the time. Resume padding, for example.
Sometimes it’s better for the business to have two sloppy PRs than a single perfect one. You should be able to deliver that way when the situation demands.
Second, no one is out there proving anything. Like formal software correctness proofs? Yeah nobody does that. We use a variety of techniques like testing and code review to try to avoid shipping bugs, but there’s always a trade off between quality and speed/cost. You’re never actually 100% certain software works. You can buy more nines but they get expensive. We find bugs in 20+ year old software.

by just_once

2 subcomments

I don't know if there's a word for this but this reads to me as like, software virtue signaling or software patronizing. It's bizarre to me to tell an engineer what their job is as a matter of fact and to claim a particular usage of a tool as mandated (a tool that no one really asked for, mind you), leveraging duty of all things.
I guess to me, it's either the case that LLMs are just another tool, in which case the already existing teachings of best practice should cover them (and therefore the tone and some content of this article is unnecessary) or they're something totally new, in which case maybe some of the already existing teachings apply, but maybe not because it's so different that the old incentives can't reasonably take hold. Maybe we should focus a little bit more attention on that.
The article mentions rudeness, shifting burdens, wasting people's time, dereliction. Really loaded stuff and not a framing that I find necessary. The average person is just trying to get by, not topple a social contract. For that, look upwards.