FRESH

Hacker News

Why agents DO NOT write most of our code – a reality check

39 points by birdculture

by netdevphoenix

0 subcomment

This has always been true. The difference is that now more people are admitting it. While you could argue that LLMs have junior level capabilities, they definitely do not have junior level self reflection or self awareness or self anything. It fundamentally doesn't learn where learning means being significantly less likely to fail at a task class x after being taught about it. And even just the ability of asking for help. These agents just generate unusable code over stopping and asking for help or guidance and this implies that they are unable to tell their limits skill wise, knowledge wise, etc.
Frankly, I have been highly concerned seeing all the transformer hype (let's not call it AI, please) in here when the gains people claims cannot be reliably replicated everywhere.
The financial incentives to make transformer tech work as it is being sold (even when it might not be cost effective) need to be paid close attention because to me, it looks a bit too much like blockchain or big data.

by reaslonik

0 subcomment

One thing I find that constantly makes pain for users is assuming that any of these models are thinking, when in reality they're completing a sentence. This might seem like a nitpick at first, but it's a huge deal in reality: if you ask a language model to evaluate whether a solution is right, it's not evaluating the solution, it's giving you a statistically likely next sentence where yes and no are fairly common. If you tell it's wrong, the likely next sentence is something affirming it, but it doesen't really make a difference.
The only way to use a tool like this is to give a problem that fits context, evaluate the solution it chugs at you and re-roll it if it wasn't correct. Don't tell a language model to think because it can't and won't. It's a way less efficient way of re-rolling the solution

by mprast

0 subcomment

great stuff; I've had almost exactly the same experience. I think blow-by-blow writeups like this are a sorely needed antidote to the hype

by another_twist

1 subcomments

Eerily maps to my experience almost word for word. I had codex write a chunk of code step by step with guidance and whatnot. Had to spend days cleaning up the mess.

by OBELISK_ASI

0 subcomment