FRESH

Hacker News

Home

The Coming Loop

225 points by ingve

by weego

5 subcomments

What does any of that mean in practice? it's just rambling about abstract concepts that seem to be designed to hint at a bigger picture, when it's just getting AI to write code for you.
Is this where it's going? Having to mystify our roles so it seems like we're still the thought leaders when actually we're just becoming pseudo-teachers that try and herd our group of AI idiots to the right conclusion for us so we don't have to, without ever giving away that it's just all techno-babble?

by mccoyb

1 subcomments

Loops work when you spend the proper amount of time to understand what you want ahead of time. The prerequisite is clarity — enough clarity that you could write a careful specification that you could hand off to a junior colleague.
Often, it takes 5-6 broken crappy versions of a thing until you understand that. There is no accelerating the 5-6 broken crappy versions - there’s no agent tech that’s going to help your meat brain avoid thinking time.
So most of my time is iterating between these two phases: I don’t understand what I want, I need to read and write and play with code, okay it’s been long enough I think I know what I want (it is extremely easy to deceive yourself) … okay now I do actually know what I want and I can write a loop.
Many people think they can jump ahead with agents. You cannot fake understanding or clarity. It is painfully obviously when someone skipped that meat brain understanding phase.

by stuartaxelowen

0 subcomment

This blog post pints to the fact that you need information across scales to make really insightful products and software. You need to understand fundamental mechanisms, strengths, and risks of your software to know where to make bets next. You need to know about the “how” of your optimization system to know which customer asks to deny.
Using layers like the loops described here to abdicate your work is you decoupling from the joint market/engineering value you originally provided.

by inline_always

1 subcomments

The bottleneck has always been the 'verification' and 'trust', that's why we have senior engineers, same way you need a head architect sign-off on a blueprint, because when things go bad you need a human agent to be the responsible party. Even if we manage to teach a herd of dumb AIs to produce massive amount of code, who's going to trust that output with their life?

by mmillin

2 subcomments

>Yet even with a lot of manual steering, that type of code does not come out of LLMs naturally, and even if the code comes out naturally like that, they will still attempt to handle now impossible errors.
This is something I’ve struggled to fight against in many PR reviews. Especially once already written, convincing someone that their excessive null checking is harmful is an uphill battle. Short of better modeling (and languages that allow for sum types to enable it), I haven’t been able to come up with a universally convincing argument against this kind of “shotgun parsing.”
Maybe it really just isn’t that big of a deal? But when actually reading through and refactoring a codebase I’ve always found it frustrating to manage these unnecessary checks. Sometimes they’re nearly impossible to delete safely once present without first adding some kind of logging or broad investigation.

by wseqyrku

3 subcomments

For some reason pro-ai blog posts feel like paid ads, I might be wrong.

by Multicomp

0 subcomment

Code is part of a shared and built understanding of an information system.
If these loopers mean we all have to move at this continuous wave of software happening, then we get to the highest levels of logical information system design and its all human judgement and balancing of business requirements to fit a given niche in a company or market. So all the programmers have to become business analysts/market researchers/businessmen...except the specific niches where AI tooling can't really clank well...or the end of the subsidized AI token era makes all this looping too expensive to continue. This feels like expert systems and symbolics lisps machines redux, where we briefly ran into the fact that its not so much the code itself not being able to do stuff, it's that your company's org always gets shipped, so if you can't change your company org, your software only has so much flexibility.
Dataflow diagrams and domain knowledge / domain modeling / ubiquitous languages may become the metalanguage that we start to use and set the standards for quality, functional, and non-functional standards and conventions. We make the "looper clankers" ensure that they fulfill that data / behavior / performance contracts before saying what "done" is, because "done" is no longer just code that compiles, code that builds, code that deploys, or even code that sits in production; it's code that fulfills all of the user requirements, operator requirements, and maintainer requirements. So, the language used may be required to make us all turn into business analysts and software architects more than syntax knowers. The revenge of UML and the return of declarative / logical design / BDD triumphing?
(Typo scan by gemma4-12b but I didn't let it alter my message)

by dataviz1000

0 subcomment

I’m having awesome success working with recursive agents. I discussed my experience with them. [0]
> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.
It takes a little human help in the first iterations but after a while it will start to iterate and improve unsupervised.
[0] https://github.com/adam-s/agent-tuning

by boscillator

6 subcomments

> the right fix is not "handle every malformed case." ... [LLMs] will still attempt to handle now impossible errors.
This is the number one code smell from LLMs and I don't know why they are so obsessed with it. In python, it often comes as `hasattr` checks on types that are defined to have that attribute, in a code base that is fully type-checked.
Why do they do that? Is it from pre-training or re-enforcement? If that latter, can the labs please fix this?

by yanis_t

3 subcomments

I keep thinking about at which point I should not force myself into the loop. As a developer I really like working on the code structure, making it clearer, thinking about good abstraction, breaking into modules, etc. I really take pleasure in it. At the same time I understand that at some point I am becoming the limiting factor.
If the point of the software is benefit people, should I still care about how the code looks.
Right now, I still think that the answer is yes, but in 3 years? in 10 years?

by furyofantares

1 subcomments

I have had some success with /goal for long tasks that can be set up in a way that the agent can do good work for an extended period of time.
A lot of tasks aren't amenable to that, and the ones that are still need a lot of care to be set up correctly. The default vibe coded codebase won't be.
I've come to think of the activity of choosing the right technology, the right architecture, the right testing setup, the right context, and the right /goals to use as programming the agent.

by CraigJPerry

0 subcomment

> My current status is that I have not had much success with this way of working for code I deeply care about
If something is judgement heavy, "code i care deeply about", then i don't really agree with the direction of travel here. Don't try to delegate decisions you care deeply about.
I do like the framing of agent loop vs harness loop, but only delegate stuff that you can accurately specify in advance, that usually means stuff that's repeatable in my case ("hey go see how i did X, do that but for Y"), and that inherently means stuff that's predictable.
For stuff where lack of my judgement as input is just going to cause me to say "no", we're down to collaborating in the "agent loop" as Armin puts it. And that's totally fine. It's fast, but also safe.
Remember before AI coding assistants, sometimes you'd get an engineer join your team who was SUPER productive, your peers would be jealous "oh yeah but you guys only got all that done because you have X on your team!" - they didn't live the curse of having that kind of person around - if you don't have them PERFECTLY aligned, then they run off at break neck speed in the wrong direction.

by wolttam

0 subcomment

I am 100% for fully agentic loops... for tasks other than engineering.
I'm not willing to outsource the understanding how things work part of myself. That part of myself is what got me into computing in the first place.
If this work becomes simply a matter of describing intent to a machine (probably through an Issue, like a user), and going to check on the result when you get the 'done' notification: I'm done.
It's possible to use the tools to do awesome things without letting go of full system understanding of the parts that you look after.

by gcanyon

2 subcomments

I'm a software developer from way back, using tools and languages that coding agents are far less familiar with.
So when I use an agent to write code, it's in languages I'm less familiar with, and often using libraries I know nothing about.
All to say, my part of the process often ends up being:
1. "Here's what I'm looking for, in detail" 2. "That's not right. Here's one way it's not right, and a specific example. Please fix that." 3. Sometimes I give suggestions for how what is going wrong might be happening, or conceptually how to work around the issue. 4. And iterate on 2-3 until the result is close enough.
That's a loop I'd love to automate.

by piker

0 subcomment

We used a “loop” before it was called that to drive MS-DOC support into Tritium. Based on that experience, I take issue with this:
“There are already impressive examples of large automatic porting efforts, including the reported work around moving parts of Bun from Zig to Rust.” (Emphasis added.)
It will be impressive if/when the Bun team is able to pick up and continue extending and supporting Bun. For us, MS-DOC remains read-only and probably perpetually buggy until we reimplement with a better understanding. Until then, it’s definitely not “impressive”. Functional? Maybe. Impressive, no.

by contagiousflow

5 subcomments

> You Cannot Quite Opt Out
I am so over this. I cannot take anyone seriously that claims inevitability of their ideas, and how you must adopt them without "being left behind". If these tools are so good and so capable the result should be able to speak for themselves rather than this FOMO inducing, emotional language.

by sixhobbits

0 subcomment

I have huge respect for Armin but all of the concerns about agents producing more code with less competent supervision from senior engineers doesn't seem that different from the status quo to me. A vast majority of all software I've ever professionally worked with has been terribly structured, hard to work with, full of bugs, etc, produced by mediocre to bad engineers and run by semi technical product owners or managers who basically promot the software into existence by making jira tickets on 2 week cycles to hold it together.
Yes it's awful, but it kind of works and has worked for a very long time. Agents are already improving a lot of open source software. Yes they're producing a lot of slop too, but having beautiful code, understanding how the system works and being able to delegate to a competent engineer you trust is reserved for the very few right now and I think we have all the systems and experience in place to deal with "bad" but working software so personally I am not concerned

by agumonkey

0 subcomment

I can't help but be tired of the LLM trendy, where people bang at loops until they hope the model sculpts something. It feels so empty mentally to just have results without constructing it.
That said the idea of loop has always been there (iteration, V cycle etc) but I'd be glad to find people with more theory and less agents swinging blindly so to speak.

by CuriouslyC

0 subcomment

Part of the problem is that models don't have a strong sense of taste, part of the problem is that the context in which projects exist is incompletely represented in the LLM context, and part of the problem is that LLMs tend to be myopic.
The lack of taste can be mitigated to some degree by improved training, though taste is not a stationary distribution in humans (see trends/fads/etc), we can at least better track the cutting edge. I think this area still has low hanging fruit but frontier labs are more concerned with being able to solve problems than the style of the solution right now (for evidence of this just look at the Opus 4.5 -> 4.8 arc).
The problem of incomplete context is partly a human problem and partly a harness/interconnectivity problem.
LLM Myopia is a harder problem to solve just by virtue training models on question/answer pairs. Countering this requires emphasizing RL on solution paths rather than just prompt/response, which is doable but harder.

by ramon156

4 subcomments

Quoting the creator of CC holds little value in my opinion. I too call my product good.
> opting out of this fully machine-driven future may not be an option.
I am contemplating whether I want to stay inside this rat race.
I completely agree with the conclusion of this blog post, by the way. I feel uneasy, and I do not enjoy the work I deliver using LLMs. I think OP did a really good job on capturing at least my current state.

by artisin

1 subcomments

This.
> Present-day models tend to produce code that is too defensive, too complex, too local in its reasoning. They avoid strong invariants. They add fallbacks instead of making bad states impossible. They duplicate code, invent bad abstractions, and paper over unclear design with more machinery. Worse though: I so far see very little progress of this improving.
Context-smithing can help to a degree and cyclomatic-like complexity rules tend to make matters worse. So, you either roll up your sleeves or close your eyes and hope for the best. I've had limited success with the latter.

by jwpapi

0 subcomment

The issue is that whilst the loops will initially lead to good results they will be less and less as context gets bigger and bigger and tougher to understand for human and AI.
So it depends really on the size of your project.

by joenot443

3 subcomments

We've had great success with agents thus far at my job. A year into Clauding and all our dev metrics are up while our downtime has remained steady.
Being an iOS engineer, much of my engineering cycle these days is going from Figma/PRD → spec → code. After being handed off to QA, we handle the bugs and product slips as they come through, while we simultaneously build/spec the upcoming addition. This is basically the same agile style that's been popular for 20y, just super-powered with agents.
How might someone accomplish the same goals using loops instead?

by abathologist

2 subcomments

Generally interesting reflections here, yet I see the same kind of myopia and fatalism that is rampant in our (fashion) industry:
> yet I have no doubts that this looping future is going to be our future despite the fact that I presently resent it
Why would anyone concluded this? LLMs are just one kind of application of MLs to software production. There is a vast solution space for automating parts of software production. The idea that slop loops are the inevitable future because they happen to be accelerating output at the moment just seems profoundly short-sighted and lacking in vision.

by jsw97

0 subcomment

In my own ham-fisted experiments with coding loops, one pathology I have noticed is that the LOC just spirals out of control. That's likely because of the layers of defensive fixes, etc., that get built. That inevitably causes context bloat (or at least navigational friction) and results in quality decline.
I wonder how many loop-related issues could be addressed by simply fixing a LOC budget, or assigning a cost in some way. Unclear how you would dial in the right numbers, though.

by johnwheeler

1 subcomments

What is a loop in simple terms?

by wiseowise

0 subcomment

A friendly reminder to just do 9 to 5 and touch lots of grass. None of this shit represents industry trends, majority of people still use chat interfaces and copy blocks of code. There’s zero early adopter advantage here, only FOMO and lots of anxiety.

by sunir

0 subcomment

Dear Abby,
I am torn. I have fallen in love with vibe coding but I still am in love with the software I’ve used for decades that works reliably.
Vibe coding gives me what I need and want right now. Its fast. Fun. Always makes me feel validated.
My older software never changes. It’s constantly telling me no. When it gets mad, it throws errors at me sometimes! But I can’t leave it. It runs my life and I know it will take care of me for years to come.
And the vibe code it’s so flaky… and expensive. It sucks up endless amount of my time, compute, and money and never gives anything back.
But it’s so fun. I tell all my friends about it and they’ve become so jealous they sought out their own vibe coder.
We’ve all found our vibe coders are a bit kinky. It’s become a social thing amongst my friends to talk about building cooler harnesses to control our vibe coders.
I don’t know what to do. My old software pays the bills but she keeps threatening to dump my ass on the curb and replace me with her own vibe coder.
I know she can’t really do it. She needs me too. And I need her.
Can we ever patch up our diffs?
— just some git with uncommitted changes

by duendefm

0 subcomment

I honestly wonder if this kind of stuff really brings something to the table. Like I use opus for sometime and certainly I can put it to good use and optimize some parts of my day to day job (programmer). But it fails so hard in such simple tasks that it seems to me that putting it in loop can't just magically make everything better, unassisted. Does anyone actually uses agents and loops to create new software, new technology? Has anyone created with those systems, software they couldn't produce otherwise technologically wise? Or is it at best just an accelerator, cutting off on the building time?

by nurettin

0 subcomment

I just tell it "you have until morning to work on this, be careful not to use too much ram and don't burn the cpu"
and then it goes off to do its thing and hopefully rngesus is with us.

by mikgp

0 subcomment

Was everyone collectively lying over the past fifty years of software development when they repeatedly said more != better?
For specific use cases, performance and security and all sorts of tuning it could be truly amazing. But maybe loops should be like a tool we make a choice to use when optimal.
I just wonder if in the future we’ll come to realize that we don’t have to throw the baby out with the bath water. That you can take a beat to understand your code and do change management, and choose the right tool for the job, and curate and say no and have agency.
An observation might be - no one writes code like Google “you’re not google” is something that gets thrown around in software shops all the time. Why is it we all think we’re going to be writing code like Anthropic?

by hakanderyal

0 subcomment

I think this is a common sentiment among heavy users of AI that also still cares about code quality.
I've built up a skill harness and review flow that makes Opus generate slop-free code 90% of the time. But the remaining 10% requires me to stay at the helm. Especially in the early stages.
I would love to use loops to automate more, but I couldn't do it with the current generation models.
And on the back of my mind I'm still evaluating the possible future where we are forced to API pricing. I'm currently paying $400 for Opus, and use around 1.5-2 billion tokens per day. This will cost around $20k/m with API pricing. And I don't want to even imagine the possible scenario of getting locked out of frontier models because of politics.
Will the models get better to cut me out of the loop completely? I believe so. Will the open source models catch up tho SOTA models, and diversify from China-only? I hope so. Otherwise 2 superpowers will wield a soft power that can cripple the tech industries of all other countries.

by aabdi

0 subcomment

The post suggests fear about a surge of increasing amounts of code by loops and loops of agents.
I don’t know if I like the current world without it though.
80% of different teams code the code is poorly tested. The code doesn’t handle data consistency or asynchronous code properly because the engineers don’t know better (and frankly don’t care enough).
Dependency handling is poorly managed leading to low quality operations with improper dashboards, alarms, and ops.
Badly managed processes leads to people doing monkey work signing off checklists rather than automation.
Frankly… why is keeping any of that good? It really pisses me off seeing people accept any of that low quality but that standard is the default and not the outlier.

by galoisscobi

13 subcomments

As much as I like Claude Code, Boris has done a lot of harm by encouraging software engineering practices that lead to slopware. We have two camps of people at work, the first camp are the agent goes brrr. They don't understand the code they write. They have loops running, agent orchestrators or agent hype du jour. The second camp is people who are inundated with PRs, are holding the line on quality, and just exhausted. We've also had some management pressures where they think people are wasting time looking at code. Perhaps because some podcast they might be listening to, somebody says coding is largely solved.
> I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.
This is going to be a net negative on software quality for people who take this up, in my opinion.
I call out Boris but I also don't think he's being malicious. He's at the center of an important technological revolution and it would be hard not to get excited. I just wished he advocated for a more balanced and a realistic perspective.

by trjordan

0 subcomment

I think there's 2 important, but separate, ideas in this post:
- Models are not good at or getting better at creating strong invariants, which his fundamental to good software
- It is unclear how to keep tabs on what the agent is doing, so you, a human, can intervene.
These are related, obviously: one of the highest-leverage things you can do is force you agent to use a strong, minimal set of types or data invariants or other constraints. They get much better when your codebase broadly supports this!
I do suspect they're separable, though.
If you had the right levers and visibility, you should be able to get the model to produce code that doesn't feel like slop. But every time I've had a model try to keep me in the loop, it inundates me with irrelevant decisions and busywork. Its inability to see what's structurally important still shows up, just differently.
[If the models get better at defining and respecting invariants, maybe there's a new flavor of slop, that's less obvious today.]

by camillomiller

2 subcomments

Show me the billion dollar solopreneur startup, or the profit increase for companies and at that point I’ll start thinking that this tasteless high level wanking might make sense in some way

by themgt

0 subcomment

This article very much resonates with my current line of thinking. With winking apologies to Douglas Hofstadter, I've been mentally shorthanding agentic software development as building via "strange loops"
Anyway, it does seem like time to start experimenting. With a large dose of humility regarding what the optimal process and stack will look ultimately like. https://z3os.ai/

by draginol

0 subcomment

This is really terrible advice right now for most people.
I've had to rip out a lot of pretty terrible code made by engineers who have tried this.
I don't disagree that eventually, "loops" when combined with unlimited tokens and amazing models in the hands of people who know how to set them up right will be amazing. But for the typical Claude Code user, it's disaster.
The problem is not that loops write bad code once. Humans do that too. The problem is that loops apply local pressure repeatedly: add a fallback, add a guard, special-case the failing input, quiet the exception, satisfy the test. Over time that selects for code that is more survivable in the short term but less intelligible in the long term.

by mattchew

0 subcomment

This is the best essay on agentic coding I've read. Clear thinking and writing, pragmatic about the future of agent-led coding.
If you usually skip straight to the comments, you might want to actually read this one.

by sandrello

0 subcomment

This is a very fatalistic take. While I understand where it's coming from, I try not to share the same mindset: engineers getting increasingly distant from how things are getting built is not something that will "undoubtedly happen, whether we like it or not".
Also:
> Now there is obviously a question if this desire to understand the code is one that I will still have a few years from now.
I do not think we should be having doubts like this. Either you consider understanding the code you ship and allowing your future self to be able to work on the system you're building to be a value, or you don't. I, for one, do, and I do not think using LLMs and coding agents will affect my point of view on that.

by knivets

0 subcomment

These new AI trends are very tiresome, very similar to 2021 crypto mania - both trigger a lot of FOMO. If we have loops that write code and we don't need to verify anything, why are the devs still here? What's point of even learning this new trick as a dev if you truly believe that this can be used without any intervention? If loops work then it follows that a loop of loop works too - why hire any people at all? Just run a bunch of loops and build a profitable business, but then what's your moat? Any person can now launch loops on top of loops.

0 subcomment

by relaxing

0 subcomment

> Present-day models tend to produce code that is too defensive, too complex, too local in its reasoning. They avoid strong invariants. They add fallbacks instead of making bad states impossible. They duplicate code, invent bad abstractions, and paper over unclear design with more machinery. Worse though: I so far see very little progress of this improving.
It’s almost as though these models were trained on a vast corpus of largely mediocre code. They will never outperform the median Github user - it is all they know, it is all they can do.

by noodletheworld

1 subcomments

Theres a deep insight in this post about the value of looping for throw away code to explore a problem space, rather than brute force a problem by just applying more tokens and hoping.
The more I play in this space, the more I’m drawn to the idea that some kind of back tracking constraint solver is a better solution than then the current naive while loop / brute force approach here.
The results I see are similar to what you get from a greedy brute force constraint solver; solves trivial problems, sometimes solves harder problems after a long time, takes too long to solve really hard problems; solutions are increasingly non optimal on average as complexity goes up.
We have so much existing knowledge about building good constraint solvers, if we could just figure out how to apply it here somehow.

by wartywhoa23

0 subcomment

Another piece of transhumanist trash on the Internet.

by simonreiff

0 subcomment

AI infrastructure/tools developer and researcher here (hic-ai.com). I fully agree with Armin's concerns.
I wrote an article recently (https://hic-ai.com/blog/tool-response-engineering) in which I argued that AI tool-engineering is the new frontier beyond prompts, and it talks about the agent loop and engineering loops, but boy I have a completely different perspective than Boris's. Rather than contending that prompts are no longer relevant because we can simply have AI think for us by having loops "prompt Claude and figuring out what to do", like what Boris claimed, I believe we have to think much harder now.
Why problems require human judgment and can't just be offloaded to an AI agent is simply this: AI agents lack durable, long-lasting, unique identities forged by real-world memories and experience, and they therefore lack judgment. There is no well-defined notion of having AI agents communicate to each other because they can't even tell the difference between talking to their future self or talking to another agent! They certainly don't reliably weigh whether a proposed fix to a failing unit test will subtly introduce new fallback logic that was never requested or otherwise alter the functionality of the system under test in some manner, or whether now is the time to refactor or now is the time not to refactor or whether to abstract more or less or anything like that. Most importantly, from a practical perspective: AI agents lack legal person status, and they therefore cannot own property or money, sue or be sued, be hired or fired, or otherwise be held financially responsible for their errors or other harmful acts they commit. Clearly, AI agents cannot even arguably qualify for legal personhood status in the future, unless and until they first are capable of assuming a unique and durable long-term identity, which in turn requires solving auth, memory, communications, and many other technical issues that are neither resolved nor standardized today. These facts combine to ensure that AI agents cannot be held liable and financially responsible when things go wrong, meaning that humans alone bear the costs of AI errors until further notice, and thus, human judgment remains the vital commodity that AI cannot replace. So many problems arise from abdicating judgment to AI agent in 2026!
Now, if all the above items were in existence, built, well-settled, etc., maybe I'd have to rethink things. But unless and until AI agents have unique identities and attain legal person status with bank accounts that can be sued and garnished in case of error, I don't think AI agents can seriously be trusted. Good human judgment and approval of all important decisions will remain the most important resource for any successful enterprise, for the foreseeable future. I think it's a very serious mistake to assume that human judgment can be swapped out safely by AI and certainly advise against taking Boris literally. Anyway, great article.

by topce

0 subcomment

my experimental looping build on top of pi and zx mostly pi deep seek and some skills ;-) https://github.com/topce/pizx

by JodieBenitez

1 subcomments

> For now I have not moved past the point of comprehension being important to me.
Ah ! This is me too... at least for what I have to ship at work. Not so much for my toy/weekend projects. But it turns out agents are also good at explaining.

0 subcomment

by m0llusk

0 subcomment

One of the biggest problems with LLMs has turned out to be the cost of actually running them and this strategy functions as a usage multiplier.

by intended

0 subcomment

I'm willing to be persuaded otherwise: Looping seems to (currently) be a side effect of token subsidies.
If token costs are nil, then you can afford to run verification and generation through the same models. If token costs are high, then you will go broke verifying code sprawl.
Currently costs are (mostly) absent from the conversation, even though costs are what decide the limits which shape experience.
Also: Firms can be held liable for the products they sell, so if code cannot be reviewed then that code is essentially a law suit waiting to happen. I believe this is what customers will be demanding in the future: someone to hold accountable when things go wrong.

by ilaksh

0 subcomment

Great article and good description of LLM code quality problems and problems that derive from that. And fair to not want a tidal wave of slop to displace your entire craft.
But this article is strangely lacking in foresight in terms of rapidly evolving model capabilities and output. One visual way to see this is to compare levels of SOTA video generation models. Look at outputs from Sora, to Veo, to Seedance 2.0, and now just released Seedance 2.5.
Or compare LLMs/VLMs as they have progressed: GPT-2, GPT-3, GPT-4, Opus, Fable/Mythos.
You can see the level of sloppiness or poor world understanding progress from comical nonsense to junior to senior with a few holes in their brain to an engineer you can actually almost trust to produce clean code if you mention the right guidelines in your instructions (such as avoiding overly local code).
As the model size/complexity increases, the intelligence increases, and so does code quality. We will also start specifically putting more high level code quality tasks into training datasets and training harnesses. I mean, Karpathy will probably see this article and make a huge dent in the issues without even larger models.
One thing people may not be aware of is that there is still a lot of room for hardware efficiency improvements and model size to grow. The compute-in-memory paradigm is just getting started in a way. Look at companies like Tensordyne and Mythic AI, but they are going to get blown out of the water by fully in-memory approaches.
For example look at the recent wurtzite ferroelectric nitrides breakthrough from the University of Michigan team (one of them tragically jumped from height after intense interrogation regarding national security concerns). The military is providing significant funding to move this towards development and scaling out of the lab.
That type or level of truly new paradigm system is going to boost efficiency by multiple orders of magnitude.
I know there are people who think Fable 5 was the end of the public LLM/VLM frontier moving, or that it is impossible to scale models further due to energy consumption. But there is zero chance that every high level VLM/LLM research team on the planet is going to stop publishing models or that the rapid progress in compute efficiency will stop.
Point being, within a year or two, the code coming out will be much cleaner. And within five or six years what you may see is that the leading models are 100+ trillion parameters and have sophisticated persistent context management etc. and they do not even produce application source code.
Instead, the database is in the context and is neurally rendered at 24 fps into whatever UI, schema and business logic you prompt it with in a broad way. The whole application is just precise thinking in an artificial brain ten times the complexity of an equivalent human brain.
And if you are disturbed by the current level of outsourcing for thinking to AI, it is just getting started. In a way it will be incredible, from another perspective horrific, but what I think we are seeing is the evolution of an ExoCortex. There will be an AI glasses stage where the integration is closer but still somewhat low bandwidth.
But sooner than later we are headed towards high bandwidth brain computer interfaces that make AI into an actual new cognitive layer.
So the waves of slop might make you feel sick, but that is nothing compared to the transhuman cyborgs powered by superhuman AI that are around the corner.

by sohilladhani

0 subcomment

[flagged]

by nfcampos

1 subcomments

My own thoughts on this, with examples https://github.com/nfcampos/loop-dev/blob/main/README.md

by rcarmo

1 subcomments

There's _way_ more than one way to do "loops". I just asked one of my superviors/auditors to document how it's been working while monitoring a few other agents that have long-term goals:
https://gist.github.com/rcarmo/4922b550ab48bf0b4246c77e606a5...

by codeDruid

1 subcomments

Yeah I don't know. Don't get me wrong, the article points makes sense. But sometimes I think that we're going to stay near this current point of productivity for a little while.
Currently my org of 8 people use around 1000 euro worth of tokens per month. We've recently had a discussion near the water-cooler, that if the cost climbs 5x-10x it may be just more worth it to get more developers (we're EU based). While the tools work and are definitely nice, even in our little org with our little budget, using Opus 4.8 we've noticed code quality going down.
If I had to bet money, I'd bet that the models will get 30-50% more nice, around 2x more expensive and we will settle into some mode where we'll use llms for some tasks, manually doing others and calling places focusing on speed at any cost some funny name like "gulags, 996, sweatshops, etc" and collectively try to somewhat avoid those places, which will need to offer a premium to attract talent. Wishful thinking.