FRESH

Hacker News

Home

When AI writes the software, who verifies it?

303 points by todsacerdoti

by roadbuster

21 subcomments

> The Claude C Compiler illustrates the other side: it optimizes for
> passing tests, not for correctness. It hard-codes values to satisfy
> the test suite. It will not generalize.
This is one of the pain points I am suffering at work: workers ask coding agents to generate some code, and then to generate test coverage for the code. The LLM happily churns out unit tests which are simply reinforcing the existing behaviour of the code. At no point does anyone stop and ask whether the generated code implements the desired functional behaviour for the system ("business logic").
The icing on the cake is that LLMs are producing so much code that humans are just rubber stamping all of it. Off to merge and build it goes.
I have no constructive recommendations; I feel the industry will keep their foot on the pedal until something catastrophic happens.

by madrox

5 subcomments

I encourage everyone to RTFA and not just respond to the headline. This really is a glimpse into where the future is going.
I've been saying "the last job to be automated will be QA" and it feels more true every day. It's one thing to be a product engineer in this era. It's another to be working at the level the author is, where code needs to be verifiable. However, once people stop vibing apps and start vibing kernels, it really does fundamentally change the game.
I also have another saying: "any sufficiently advanced agent is indistinguishable from a DSL." I hadn't considered Lean in this equation, but I put these two ideas together and I feel like we're approaching some world where Lean eats the entire agentic framework stack and the entire operating system disappears.
If you're thinking about building something today that will still be relevant in 10 years, this is insightful.

by wyum

3 subcomments

I believe there is a Verification Complexity Barrier
As you add components to a system, the time it takes to verify that the components work together increases superlinearly.
At a certain point, the verification complexity takes off. You literally run out of time to verify everything.
AI coding agents hit this barrier faster than ever, because of how quickly they can generate components (and how poorly they manage complexity).
I think verification is now the problem of agentic software engineering. I think formal methods will help, but I don't see how they will apply to messy situations like end-to-end UI testing or interactions between the system and the real world.
I posted more detailed thoughts on X: https://x.com/i/status/2027771813346820349

by muraiki

3 subcomments

The article says that AWS's Cedar authorization policy engine is written in Lean, but it's actually written in Dafny. Writing Dafny is a lot closer to writing "normal" code rather than the proofs you see in Lean. As a non-mathematician I gave up pretty early in the Lean tutorial, while in a recent prototype I learned enough Dafny to be semi-confident in reviewing Claude's Dafny code in about half a day.
The Dafny code formed a security kernel at the core of a service, enforcing invariants like that an audit log must always be written to prior to a mutating operation being performed. Of course I still had bugs, usually from specification problems (poor spec / design) or Claude not taking the proof far enough (proving only for one of a number of related types, which could also have been a specification problem on my part).
In the end I realized I'm writing a bunch of I/O bound glue code and plain 'ol test driven development was fine enough for my threat model. I can review Python code more quickly and accurately than Dafny (or the Go code it eventually had to link to), so I'm back to optimizing for humans again...

by maltalex

2 subcomments

Maybe I'm missing something, but isn't this the same as writing code, but with extra steps?
Currently, engineers work with loose specifications, which they translate into code. With the proposed approach, they would need to first convert those specifications into a formally verifiable form before using LLMs to generate the implementation.
But to be production-ready, that spec would have to cover all possible use-cases, edge cases, error handling, performance targets, security and privacy controls, etc. That sounds awfully close to being an actual implementation, only in a different language.

by neya

2 subcomments

The fundamental problem is the verification loop for the average developer is grounded not in tests, but with results. Write code, reload browser, check output. Does it work the way I want? Good. We're done here.
Not write code, write tests, ensure all test-cases are covered. Now, imagine such a flaky foundation is used to build on top of even more untested code. That's how bad quality software (that's usually unfixable without a major re-write) is born.
Also, most vibe-coders don't have enough experience/knowledge to figure out what is wrong with the code generated by the AI. For that, you need to know more than the AI and have a strong foundation in the domain you're working on. Here is an example: You ask the AI to write the code for a comment form. It generates the backend and frontend code for you (let's say React/Svelte/Vue/whatever). The vibe-coder sees the UI - most likely written in Tailwind CSS and thinks "wow, that looks really good!" and they click approve. However, an experienced person might notice the form does not have CSRF protection in place. The vibe-coder might not even be aware of the concept of CSRF (let alone the top 10 OSWAP security risks).
Hence, the fundamental problem is not knowing about the domain more than the AI to pick up the flaw. Unless this fundamental problem is solved - which I don't think it will anytime soon because everyone can generate code + UI these days, I don't see a solution to the verification problem.
However, this is good news for consultants and the like because it creates more work down the line to fix the vide-coded mess because they got hacked the very next day and we can charge them a rush fee on top of it, too. So, it's not all that bad.

by _pdp_

2 subcomments

I think the issue goes even deeper than verification. Verification is technically possible. You could, in theory, build a C compiler or a browser and use existing tests to confirm it works.
The harder problem is discovery: how do you build something entirely new, something that has no existing test suite to validate against?
Verification works because someone has already defined what "correct" looks like. There is possible a spec, or a reference implementation, or a set of expected behaviours. The system just has to match them.
But truly novel creation does not have ground truth to compare against and no predefined finish line. You are not just solving a problem. You are figuring out what the problem even is.

by voxleone

0 subcomment

What’s striking here is the convergence on a minimal axiomatic kernel (Lean) as the only scalable way to guarantee coherent reasoning. Some of us working on foundational physics are exploring the same methodological principle. In the “Functional Universe” framework[0], for example, we start from a small set of axioms and attempt to derive physical structure from that base.
The domains are different, but the strategy is similar: don’t rely on heuristics or empirical patching; define a small trusted core of axioms and generate coherent structure compositionally from there.
[0] https://voxleone.github.io/FunctionalUniverse

by randusername

0 subcomment

> Most people think of verification as a cost, a tax on development, justified only for safety-critical systems. That framing is outdated.
> The value is not in the verification workforce. It is in what verified delivery enables
These takes misrepresent safety-critical software verification. The verification workforce is indispensable. Whereas the software engineers are usually highly specialized, the verification team owns the domain and project knowledge to make sure the whole thing integrates. They consult on requirements, anticipate oversights, build infrastructure and validate it, and execute the verification.
So when the business needs a go / no-go call, they ask the verification lead.
Automation and proofs have their place, but I don't see how verification without human accountability is workable.

by phyzix5761

0 subcomment

I found my best use of AI for writing code is with a TDD approach. I write 1 test, watch it fail, let the AI implement the solution, and then if I like it I commit. Write the next test and repeat.
It keeps me in the loop, I'm testing actual functionality rather than code, and my code is always in a state where I can open a PR and merge it back to main.

by faitswulff

0 subcomment

Someone actually did fuzz the claude c compiler:
https://john.regehr.org/writing/claude_c_compiler.html

by SurvivorForge

0 subcomment

The uncomfortable truth is that most teams were already bad at verification before AI entered the picture. The difference is that AI-generated code comes in faster, so the verification bottleneck becomes painfully obvious. I think the real opportunity here is that AI forcing us to think harder about specifications and correctness might actually improve the quality of human-written code too — it's making us confront the fact that "it works on my machine" was never real verification.

by cagz

0 subcomment

> No one is formally verifying the result
This might be the case for a hobby project or a start-up MVP being created in a rush, but in reality, there are a few points we may want to take into account:
1. Software teams I work with are maintaining the usual review practices. Even if a feature is completely created by AI. It goes through the usual PR review process. The dev may choose "Accept All", although I am not saying this is a good practice, the change still gets reviewed by a human.
2. From my experience, sub-agents intended for code and security review do a good job. It is even possible to use another model to review the code, which can provide a different perspective.
3. A year ago, code written by AI was failing to run the first time, requiring a painful joint troubleshooting effort. Now it works 95% of the time, but perhaps it is not optimal. Given the speed at which it is improving, it is safe to expect that in 6-9 months' time, it will not only work but will also be written to a good quality.

by lateforwork

3 subcomments

The first thing you should have AI write is a comprehensive test suite. Then have it implement the main functionality. If the tests pass that is one level of verification.
In addition you can have one AI check another AI's code. I routinely copy/paste code from Claude to ChatGPT and Gemini have them check each other's code. This works very well. During the process I have my own eyes verify the code as well.

by thorn

1 subcomments

I don’t think many people are interested in producing 100% correct code. I saw this at big companies and small startups. It is all the same: ship feature asap before competitors. Writing correct code is almost always punished indirectly in the way they only praise the feature delivering heroes that got promoted. Nobody got promoted for preventing bugs.
Maybe in some other circles it is not like that, but I am sure that 90% of industry measures output in the amount of value produced and correct code is not the value you can show to the stockholders.
It is sad state of affairs dictated by profit seeking way of life (capitalism).

by vicchenai

0 subcomment

The verification problem scales poorly with AI complexity. Current approaches rely on test suites, but AI-generated code tends to optimize for passing existing tests rather than correctness in the general case.
What's interesting is this might be the forcing function that finally brings formal verification into mainstream use. Tools like Lean and Coq have been technically impressive but adoption-starved. If unverified AI code is too risky to deploy in critical systems, organizations may have no choice but to invest in formal specs. AI writes the software, proof assistants verify it.
The irony: AI-generated code may be what makes formal methods economically viable.

by toastal

0 subcomment

Note that Lean doesn’t have a monopoly on verification languages. Dafney, Rocq, Why3, ATS, Agda, Idris… these all can do it too. The fact that Lean is controlled by Microsoft should cause folks pause considering how they are trying to monopolize so many other spaces… & the author works on Lean. Rather than comparing Lean to performance like OCaml/Haskell like the article does, Why3 is a superset of OCaml, Agda can compile to Haskell, & ATS2 compiles to C—rather than needing to adopt an entirely new language.

by ddanv

0 subcomment

So you feed the LLM some C code like zlib and it rewrites and proves it.
The goal is to make the code write-only and replace it with spec declaration? ... math ppl still cant accept the "x = 1;" statement :)

by acedTrex

2 subcomments

No one does currently, and its going to take a few very painful and high profile failures of vital systems for this industry to RELEARN its lesson about the price of speed.
In fact it will probably need to happen a few times PER org for the dust to settle. It will take several years.

by oakpond

6 subcomments

You do. Even the latest models still frequently write really weird code. The problem is some developers now just submit code for review that they didn't bother to read. You can tell. Code review is more important than ever imho.

by chromaton

0 subcomment

TFA seems to be big on mathematical proof of correctness, but how do you ever know you're proving the right thing?

by globular-toast

0 subcomment

I'm starting to think of this LLM thing a bit like fossil fuels.
We've got fossil fuels that were deposited over millions of years, a timescale we are not even properly equipped to imagine. We've been tapping that reserve for a few decades and it's caused all kinds of problems. We've painted ourselves into a corner and can't get out.
Now we've got a few decades worth of software to tap. When you use an LLM you don't create anything new, you just recycle what's already there. How long until we find ourselves in a very similar corner?
The inability of people to think ahead really astounds me. Sustainability should be at the forefront of everyone's mind, but it's barely even an afterthought. Rather, people see a tap running and just drink from it without questioning once where the water is coming from. It's a real animal brain thing. It'll get you as far as reproducing, but that's about it.

by nemo44x

1 subcomments

I believe the old ways, which agile destroyed, will come back because the implementation isn’t the hardest part now. Agile recognized correctly that implementation was the hard part to predict and that specification through requirements docs, UML, waterfall, etc. were out of date by the time the code was cooked.
I don’t think we’ll get those exact things back but we will see more specification and design than we do today.

by boznz

2 subcomments

This is the biggest problem going forward. I wrote about the problem many times on my blog, in talks, and as premises in my sci-fi novels
Sitting in your cubical with your perfect set of test suites, code verification rules, SOP's and code reviews you wont want to hear this, but other companies will be gunning for your market; writing almost identical software to yours in the future from a series of prompts that generate the code they want fast, cheap, functionally identical, and quite possibly untested.
As AI gets more proficient and are given more autonomy (OpenClaw++) they will also generate directly executable binaries completely replacing the compiler, making it unreadable to a normal human, and may even do this without prompts.
The scenario is terrifying to professional software developers, but other people will do this regardless of what you think, and run it in production, and I expect we are months or just a few years away from this.
Source code of the future will be the complete series of prompts used to generate the software, another AI to verify it, and an extensive test suites.

by sublimefire

0 subcomment

At the end it mentions what the future engineers will do:
> Engineers spend more time writing specifications and models, designing systems at a higher level of abstraction, defining precisely what systems must do, what invariants they must maintain, what failures they must tolerate.
We do that already and the abstractions are very high. The other part is about knowing what the system is supposed to do way in advance, which is not how a lot of engineering is done because it is an exploratory problem. Very few of us write crypto or spend much time in a critical piece of code. And most importantly no user ever said if the software they buy is using proofs. Just like security these concerns are at the bottom of a barrel.

by bfung

1 subcomments

When humans write the software, who verifies it?
half sarcasm, half real-talk.
TDD is nice, but human coders barely do it. At least AI can do it more!

by slopinthebag

0 subcomment

LLM generated code combined with formal verification just feels like we're entering the most ridiculous timeline. We know formal verification doesn't work at scale, hence we don't use it. We might get fully vibe coded systems but we sure as hell won't be able to verify them.
The collapse of civilisation is real.

by dataviz1000

1 subcomments

100% of my innovation for the past month has been getting the coding agent to iterate with an OODA loop (I make it validate after act step) trying to figure out how to get it to not stop iterating.
For example, I have discovered there is a big difference between prompting 'there is a look ahead bias' and 'there is a [T+1] look ahead bias' where the later will cause it to not stop until it finds the [T+1] look ahead bias. It will start to write scripts that will `.shift(1)` all values and do statistical analysis on the result set trying to find the look ahead bias.
Now, I know there isn't look ahead bias, but the point is I was able to get it to iterate automatically trying different approaches to solve the problem.
The software is going to verify itself eventually, sooner than later.

by waterTanuki

0 subcomment

I've grown to hate using python in production since LLMs have been around. Python cannot enforce minimum standards like cleaning up unused variables, checking array access, and properly typing your functions. There's a number of tools built to do this but none of them can possibly replace a compiler.
Compiled languages like Go and Rust are my new default for projects on the backend, typescript with strict typing on for the frontend, and I foresee the popularity growing the more LLM use grows. The moment you let an LLM loose in a Javascript/Python codebase everything goes off the rails.

by holtkam2

1 subcomments

At the end of the day you need humans who understand the business critical (or safety critical) systems that underpin the enterprise.
Someone needs to be held accountable when things go wrong. Someone needs to be able to explain to the CEO why this or that is impossible.
If you want to have AI generate all the code for your business critical software, fine, but you better make sure you understand it well. Sometimes the fastest path to deep understanding is just coding things out yourself - so be it.
This is why the truly critical software doesn’t get developed much faster when AI tools are introduced. The bottleneck isn’t how fast the code can be created, it’s how fast humans can construct their understanding before they put their careers on the line by deploying it.
Ofc… this doesn’t apply to prototypes, hackathons, POCs, etc. for those “low stakes” projects, vibe code away, if you wish.

by raincole

0 subcomment

A little mental trick: every time you see a title with AI(or LLM or agent) and a question mark, replace 'AI' with 'humans' and ask the question again.
The answer is no one, for most of the time.

by MickeyShmueli

0 subcomment

the test generation loop is brutal. i've been burned by this exact thing, you ask the agent to write code, then ask it to write tests for that code, and surprise, they all pass because the tests are literally just "does the code do what the code does"
honestly think the answer isn't more tests, it's stricter contracts. like if your API has an OpenAPI spec, you can validate requests/responses against it automatically. the spec becomes the source of truth, not the tests, not the implementation
we've been doing this backwards for years. write code, write tests that match the code, realize six months later that both the code and tests were implementing the wrong behavior. but if you have a machine-readable contract (openapi, json schema, whatever), at least you can verify one dimension automatically
ngl this is why i'm skeptical of "AI will write all the code" takes. without formal specs, you're just getting really confident garbage that happens to pass its own tests. which tbh describes a lot of human-written code too lol

by ljlolel

0 subcomment

The users verify and fix it on the fly with the Claude VM JIT https://jperla.com/blog/claude-is-a-jit

by 50lo

0 subcomment

One thing that seems under-discussed in this space is the shift from verifying programs to verifying generation processes.
If a piece of code is produced by an agent loop (prompt -> tool calls -> edits -> tests), the real artifact isn’t just the final code but the trace/pipeline that produced it.
In that sense verification might look closer to: checking constraints on the generator (tests/specs/contracts), verifying the toolchain used by the agent, and replaying generation under controlled inputs.
That feels closer to build reproducibility or supply-chain verification than traditional program proofs.

by liampulles

0 subcomment

Attention human: I have formally verified a route for converting the desks in the office into paperclips. Proceeding autonomously.

by righthand

2 subcomments

No one really. Code is for humans to read and for machines to compile and execute. Llms are enabling people to just write the code and not have anyone read it. It’s solving a problem that didn’t really exist (we already had code generators before llms).
It’s such an intoxicating copyright-abuse slot machine that a buddy who is building an ocaml+htmx tree editor told me “I always get stuck and end up going to the llm to generate code. Usually when I get to the html part.” I asked if he used a debugger before that, he said “that’s a good idea”.

by shubhamintech

1 subcomments

Formal verification gets you to deploy with confidence but it's still a snapshot. What happens when real-world inputs drift from what you tested against? The subtler problem is runtime behavioral drift: an agent that's technically correct but consistently misunderstands a whole class of user queries is invisible to any pre-deploy check. Pre-deploy and post-deploy verification are genuinely different problems.

by pnathan

0 subcomment

I am experimenting at a very early stage with using Verus in Rust to generate proveably correct Rust. I let the AI bang on the proof and trust the proof assistant to confirm it.
There is another route with Lean where Rust generates the Lean and there is proof done there but I haven't chased that down fully.
I think formal verification is a big win in the LLM era.

by phantomathkg

0 subcomment

Human writes the requirements, contains flaw. Human or AI translate that to specifications, and eventually code.
It does not matter if the middle man is human or AI, or written in "traditional language" or "formal verification". Bugs will be there as human failed to defined a bullet proof requirements.

0 subcomment

by anonhacker199

0 subcomment

The biggest issue right now is most AI tools aren't hooked up appropriately to an environment they can test in (Chrome typically). Replit works extremely well because it has an integrated browser and testing strategy. AI works very well when it has the ability to check its own work.

by p0u4a

0 subcomment

Someone once told me, agentic coding may lead to something akin to a "software engineering union" forming, where a set of guidelines control code quality. Namely, at least one of writing, testing, and reviewing of code must be done by a human.

by 8note

0 subcomment

i think formal methods and math proofs will be useful like tests are for getting more feedback to the LLM to get to a working solution, but i dont think it at all solves the problem of "poisoned training data introduces specific bugs and vulnerabilities"
the bug will also be introduced in the formal spec, and people will still miss it by not looking.
i think fast response and fix time - anti-entropy - will win out against trying to increase the activation energy, to quote the various S3 talks. You need a cleanup method, rather than to prevent issues in the first place

by boristane007

0 subcomment

loving this! this is the proof of this love https://github.com/pravsingh/boris

by mkoubaa

0 subcomment

PMs have been asking the same question about software developers for decades

by duxup

1 subcomments

It is wild to me how things SEEM to be playing out. With AI I code faster …. AND read / test more.
I’m doing more verification than ever.

by amelius

0 subcomment

When AI writes the software, who is proud of it?

0 subcomment

by saltyoldman

0 subcomment

the same agent that wrote it. it's not great now, but getting better each month. Agents will likely start performing visual testing, checking the database, etc... i don't think people understand quite how powerful these agents already are - or could be. In the next month someone will likely unlock new visual check agents (I mean it already happened but few people are using it so far).

by mentalgear

0 subcomment

> Where this leads is clear. Layer by layer, the critical software stack will be reconstructed with mathematical proofs built in. The question is not whether this happens, but when.

by midtake

0 subcomment

I do. I verify it so hard I've begun to mistrust it lately, seeing Gemini make glaring conceptual mistakes and refusing to align to my corrections.

by turlockmike

0 subcomment

This is where you can use adverserial systems.
1. Agent writes a minimal test against a spec. 2. Another agent writes minimal implementation to make test pass only.
Repeat
This is ping pong pair programming.

by phyzome

0 subcomment

Verify? Seems like no one is even reviewing this stuff.

by js8

0 subcomment

I think intelligence in general means solving (and manipulating) constraint problems. So when you ask AI to, say, write a "snake game", it figures out what this means in terms of constraints to all the possible source codes that can be written (so it will have things like the program is a game, so it has a score, there is a simple game world and there is user input connected to the display of the game world and all sorts of constraints like this), and then it further refines these constraints until it picks a point (from the space of all possible programs) that satisfies those constraints, more or less.
One beautiful thing about current AI is that this process can handle fuzzy constraints. So you don't have to describe the requirements (constraints) exactly, but it can work with fuzzy sets and constraints (I am using "fuzzy" in the quite broad sense), such as "user can move snake head in 4 directions".
Now, because of this fuzzy reasoning, it can sometimes fail. So the wrong point (source code) can get picked from the fuzzy set that represents "snake game". For example, it can be something buggy or something less like a canonical snake game.
In that case of the failure, you can either sample another datapoint ("write another snake game"), or you can add additional constraints.
Now, the article argues in favor of formal verification, which essentially means, somehow convert all these fuzzy constraints into hard constraints, so then when we get our data point (source code of the snake game), we can verify that it indeeds belongs to the (now exact) set of all snake games.
So, while it can help with the sampling problem, the alignment problem still remains - how can we tell that the AI's (fuzzy) definition of a functional "snake game" is in line with our fuzzy definition? So that is something we don't know how to handle other than iteratively throwing AIs at many problems and slowly getting these definitions aligned with humans.
And I think the latter problem (alignment with humans on definitions) is the real elephant in the room, and so the article is IMHO focusing on the wrong problem by thinking the fuzzy nature of the constraints is the main issue.
Although I think it would definitely be useful if we had a better theoretical grasp on how AI handles fuzzy reasoning. As AI stands now, practicality has beaten theory. (You can formalize fuzzy logic in Lean, so in theory nothing prevents us from specifying fuzzy constraints in a formal way and then solving the resulting constraint problem formally, it just might be quite difficult, like solving an equation symbolically vs numerically.)

by bryanlarsen

0 subcomment

You can use AI to make a reviewers job much easier. Add documents, divide your MR into reviewable chunks, et cetera.
If reviewing is the expensive part now, optimize for reviewability.

by ozten

0 subcomment

This is really great and important progress, but Lean is still an island floating in space. Too hard to get actual work done building any real world system.

by KingOfCoders

1 subcomments

Most software the companies I worked for that was put into production, was not verified. There were spotty code reviews, mostly focusing on "I would have done it differently" and a limited amount of unit tests with low test coverage and Heisenberg E2E tests, often turned off because, Heisenberg. Sometimes overworked, bottle neck, testers.
There is hope that with AI we get to better tested, better written, better verified software.

by signa11

0 subcomment

> When AI writes the software, who verifies it?
oh thats quite simple: the dude / dudette who gets blamed is the one who verifies it.

by yoaviram

4 subcomments

I just finished writing a post about exactly this. Software development, as the act of manually producing code, is dying. A new discipline is being born. It is much closer to proper engineering.
Like an engineer overseeing the construction of a bridge, the job is not to lay bricks. It is to ensure the structure does not collapse.
The marginal cost of code is collapsing. That single fact changes everything.
https://nonstructured.com/zen-of-ai-coding/

by stpedgwdgfhgdd

0 subcomment

I don’t think this an AI specific problem, same can happen with humans.

by sslayer

0 subcomment

State Sponsored Hackers AI will verify it.

by reenorap

0 subcomment

> Google and Microsoft both report that 25–30% of their new code is AI-generated.
I think it's closer to 100%. I don't know anyone who isn't doing 100% ai-generated code. And we don't even code review it because why bother? If there's an error then we just regenerate the code or adjust the prompt.

by csense

2 subcomments

It seems like sound testing methodology to identify important theorems related to the code, prove them, and then verify the proof.
Verification gets sold as "bulletproof" but I'm skeptical for a couple reasons:
- How do you establish the relationship between the code and the theorem? Lean theorem can be applied to zlib implemented in Lean, what if you want to check zlib implemented in a normal programming language like C, JS, Zig, or whatever?
- How do you know the key properties mean what you think they mean? E.g. the theorem says "ZlibDecode.decompressSingle (ZlibEncode.compress data level) = .ok data" but it feels like it would be very easy to accidentally prove ∃ x s.t. decompress(compress(x)) == x while thinking you proved ∀ x, decompress(compress(x)) == x.
I've tried Lean and Coq and...I don't really like them. The proofs use specialized programming languages. And they seem deliberately designed to require you to use a context explorer to have any hope of understanding the proof at all. OTOH a normal unit test is written in a general purpose programming language (usually the same one as the program being tested), I'm much more comfortable checking that a Claude-written unit test does what I think it's doing than a Claude-written Lean proof of correctness.

by indymike

2 subcomments

Because of the scale of generated code, often it is the AI verifying the AI's work.

by testemailfordg2

0 subcomment

Time does...Microslop is an example

by drivebyhooting

0 subcomment

That was a prolix and meandering essay. Next time I’d rather just look at the prompts and hand edits that went into writing it rather than the final artifact; much like reviewing the documentation, spec, and proof over the generated code as extolled by the article.

by Nevermark

0 subcomment

> When AI writes the software, who verifies it?
The users. It's a start.

by geldedus

0 subcomment

The written specs and the acceptance tests.

by simonw

0 subcomment

The "Nearly half of AI-generated code fails basic security tests" link provided in this piece is not credible in my opinion. It's a very thinly backed vendor report from a company selling security scanning software.

by Ronsenshi

0 subcomment

A bit unrelated to the article, more of a commentary about how many engineers at this point barely write any code or even do code review.
It seems to me like a huge amount of engineers/developers in comments are turning into Tom Smykowski from The Office. Remember that guy?
His job was to be a liaison between customers and engineers because he had "people skills":
"I deal with the god damn customers so the engineers don't have to. I have people skills; I am good at dealing with people. Can't you understand that? What the hell is wrong with you people?"
Except now, based on comments here it, some engineers are passing instructions from customers to AI because they have "AI skills". While AI is doing coding, helps with spec clarification, reviewing code and writing tests.
That's scary and depressing. This field in a few years will be impossible to recognize.

by milind-soni

0 subcomment

At this point, its the users

by heftykoo

0 subcomment

Another AI, obviously. And then a third AI to monitor the first two for conflicts of interest. Jokes aside, this is exactly the era where formal verification (like TLA+ or Lean, seeing the other post on the front page) actually makes commercial sense. If the code is generated, the only human output of value is the spec. We are moving from writing logic to writing constraints.

by lenerdenator

0 subcomment

You verify it.
The value proposition of a software engineer is no longer creating code; it is making sure that the ultra-fast code generating capability of the LLM fits the needs of the business.
BDD/TDD, code reviews, deep dives on what the code does, and educating yourself on design patterns... all of these help.

by lgl

0 subcomment

I'm in the process of building v2.0 of my app using opus 4.6 and largely agree with this.
It's pretty awesome but still does a lot of basic idiotic stuff. I was implementing a feature that required a global keyboard shortcut and asked opus to define it, taking into account not to clash with common shortcuts. He built a field where only one modifier key was required. After mentioning that this was not safe since users could just define CTRL+C for the shortcut and we need more safeguards and require at least two modifier keys I got the usual "you're absolutely right" and proceeded to require two modifier keys. But then it also created a huge list of common shortcuts into a blacklist like copy, cut, paste, print, select all, etc.. basically a bunch of single modifier key shortcuts. Once I mentioned that since we're already forcing two modifier keys that's useless it said I'm right again and fixed it.
The counter point of this idiocy is that it's very good overall at a lot of what is (in my mind) much more complicated stuff. It's a .NET app and stuff like creating models, viewmodels, usercontrols, setting up the entire hosting DI with pretty much all best practices for .net it does it pretty awesomely.
tl;dr is that training wheels are still mandatory imho

by elif

0 subcomment

Users

by mootoday

0 subcomment

AI of course

by __mharrison__

0 subcomment

TDD...

by bitwize

0 subcomment

Also AI.

by kazinator

0 subcomment

> What would a verification platform for the AI era require? A small, trusted kernel: a few thousand lines of code that check every step of every proof mechanically.
The code is a stochastic parrot job produced with zero algorithmic understanding.
Out of what magic unicorn's ass are you going to get a matching proof for it, to feed to this trusted kernel?

by titaniumrain

0 subcomment

a dinosaur...

by flashbaby

0 subcomment

[flagged]

by aplomb1026

0 subcomment

[dead]

by bdcravens

0 subcomment

The same ones who verify it when I write it: my customers in production! /s (well, maybe /s)

by agenthustler

0 subcomment

[flagged]

by furyofantares

0 subcomment

(answering the title) The lusers

by foolfoolz

1 subcomments

no one wants to believe this but there will be a point soon when an ai code review meets your compliance requirements to go to production. is that 2026? no. but it will come

by rademaker

0 subcomment

In his latest essay, Leonardo de Moura makes a compelling case that if AI is going to write a significant portion of the world’s software, then verification must scale alongside generation. Testing and code review were never sufficient guarantees, even for human-written systems; with AI accelerating output, they become fundamentally inadequate. Leo argues that the only sustainable path forward is machine-checked formal verification — shifting effort from debugging to precise specification, and from informal reasoning to mathematical proof checked by a small, auditable kernel. This is precisely the vision behind Lean: a platform where programs and proofs coexist, enabling AI not just to generate code, but to generate code with correctness guarantees. Rather than slowing development, Lean-style verification enables trustworthy automation at scale.