Taming LLMs: Using Executable Oracles to Prevent Bad Code
40 points by mad44
by dktoao
4 subcomments
"Our goal should be to give an LLM coding agent zero degrees of freedom"
Wouldn't that just be called inventing a new language with all the overhead of the languages we already have? Are we getting to the point where getting LLMs to be productive and also write good code is going to require so much overhead and additional procedures and tools that we might as well write the code ourselves. Hmmm...
by shubhamintech
0 subcomment
The oracle problem is tractable when the output is code: you can compile it, run tests, diff the output. For conversational AI it's much harder. We've seen teams use LLM-as-judge as their validation layer and it works until the judge starts missing the same failure modes as the generator.
by JSR_FDED
0 subcomment
> JustHTML was effectively tested into existence using a large, existing test suite.
I love the phrase “tested into existence”.
by RS-232
4 subcomments
Has anyone had success using 2 agents, with one as the creator and one as an adversarial "reviewer"? Is the output usually better or worse?