Like if I'm not ready to jump on some AI-spiced up special IDE, am I then going to just be left banging rocks together? It feels like some of these AI agent companies just decided "Ok we can't adopt this into the old IDE's so we'll build a new special IDE"?_Or did I just use the wrong tools (I use Rider and VS, and I have only tried Copilot so far, but feel the "agent mode" of Copilot in those IDE's is basically useless).
Scaled GitHub stars to 20,000+
Built engaged communities across platforms (2.8K X, 5.4K LinkedIn, 700+ YouTube)
etc, etc.
No doubt impressive to marketing types but maybe a pinch of salt required for using AI Agents in production.
Already a "no", the bottleneck is "drowning under your own slop". Ever noticed how fast agents seems to be able to do their work in the beginning of the project, but the larger it grows, it seems to get slower at doing good changes that doesn't break other things?
This is because you're missing the "engineering" part of software engineering, where someone has to think about the domain, design, tradeoffs and how something will be used, which requires good judgement and good wisdom regarding what is a suitable and good design considering what you want to do.
Lately (last year or so), more client jobs of mine have basically been "Hey, so we have this project that someone made with LLMs, they basically don't know how it works, but now we have a ton of users, could you redo it properly?", and in all cases, the applications have been built with zero engineering and with zero (human) regards to design and architecture.
I have no yet have any clients come to me and say "Hey, our current vibe-coders are all busy and don't have time, help us with X", it's always "We've built hairball X, rescue us please?", and that to me makes it pretty obvious what the biggest bottleneck with this sort of coding is.
Moving slower is usually faster long-term granted you think about the design, but obviously slower short-term, which makes it kind of counter-intuitive.
The pipe dream of agents handling Github Issue -> PullRequest -> Resolve Issue becomes a nightmare of fixing downstream regressions or other chaos unleashed by agents given too much privilege. I think people optimistic on agents are either naive or hype merchants grifting/shilling.
I can understand the grinning panic of the hype merchants because we've collectively shovelled so much capital into AI with very little to show for it so far. Not to say that AI is useless, far from it, but there's far more over-optimism than realistic assessment of the actual accuracy and capabilities.
So with the top performers I think what's most effective is just stating clearly what the end result you want to be (with maybe some hints for verification of results which is just clarifying the intent more)
In one week, I fine-tuned https://github.com/kstenerud/bonjson/ for maximum decoding efficiency and:
* Had Claude do a go version (https://github.com/kstenerud/go-bonjson), which outperforms the JSON codec.
* Had Claude do a Rust version (https://github.com/kstenerud/rs-bonjson), which outperforms the JSON codec.
* Had Claude do a Swift version (https://github.com/kstenerud/swift-bonjson), which outperforms the JSON codec (although this one took some time due to the Codable, Encoder, Decoder interfaces).
* Have Claude doing a Python version with Rust underpinnings (making this fast is proving challenging)
* Have Claude doing a Jackson version (in progress, seems to be not too bad)
In ONE week.
This would have taken me a year otherwise, getting the base library going, getting a test runner going for the universal tests, figuring out how good the SIMD support is and what intrinsics I can use, what's the best tooling for hot path analysis, trying various approaches, etc etc. x5.
Now all I do is give Claude a prompt, a spec, and some hand-holding for the optimization phase (admittedly, it starts off at 10x slower, so you have to watch the algorithms it uses). But it's head-and-shoulders above what I could do in the last iteration of Claude.
I can experiment super quickly: Try caching previously encountered keys and show me the performance change. 5 mins, done. Would take me a LOT longer to retool the code just for a quick test. Experiments are dirt cheap now.
The biggest bottleneck right now is that I keep hitting my token limits 1-2 hours before each reset ;-)
- Generate a stable sequence of steps (a plan), then carry it out. Prevents malicious or unintended tool actions from altering the strategy mid-execution and improves reliability on complex tasks.
- Provide a clear goal and toolset. Let the agent determine the orchestration. Increases flexibility and scalability of autonomous workflows.
- Have the agent generate, self-critique, and refine results until a quality threshold is met.
- Provide mechanisms to interrupt and redirect the agent’s process before wasted effort or errors escalate. Effective systems blend agent autonomy with human oversight. Agents should signal confidence and make reasoning visible; humans should intervene or hand off control fluidly.
If you've ever heard of "continuous improvement", now is the time to learn how that works, and hook that into your AI agents.I've flagged it, that's what we should be doing with AI content.
But scrap that, it's better just thinking about agent patterns from scratch. It's a green field and, unless you consider yourself profoundly uncreative, the process of thinking through agent coordination is going to yield much greater benefit than eating ideas about patterns through a tube.
0: https://arxiv.org/search/?query=agent+architecture&searchtyp...
It literally gets "stuck" and becomes un-scrollable.
thanks for the share!