I tend to lean towards them being snake oil. A lot of process and ritual around using them, but for what?
I don't think the models themselves are a good fit for the way these frameworks are being used. It probably goes against their training.
Now we try to poison the context with lots of (for my actual task at hand) useless information so that the model can conform to my superficial song-and-dance process? This seems backwards.
I would argue that we need less context poisoning with useless information. Give the model the most precise information for the actual work to be done and iterate upon that. The song and dance process should happen outside of the context constrained agent.
I see one mention brownfield development. Has anyone with experience using these frameworks fired up Claude Code on enterprise software and had confident results? I have unchecked access to Claude Code at work and based on personal agentic coding I’m sure they do aid it. I have decent but not consistent results with my own “system” in our code base. At least until the front end UI components are involved even with Playwright. But I’m curious — how much litter is left behind? How is your coworker tolerance? How large are your pull requests? What is your inference cost? How do these manage parallel?
The README documentation for many have a mix of fevered infomercial, system specific jargon, emoji splatter and someone’s dad’s very specific toolbox organization approach only he understands. Some feel like they’re setting the stage to sell something…trademarked!? Won’t Anthropic and others just incorporate the best of the bunch into their CLI tools in time?
Outside of work I’ve regularly used a reasoning model to produce a ten page spec, wired my project with strictest lint, type check, formatter, hooks, instruct it to check off as it goes and do red green TDD. I can tell gpt-5 in Cursor to “go”, occasionally nudge to stay on task and “ok next” then I’ll end up with what I wanted in time plus gold plating. The last one was a CLI tool for my agent to invoke and track their own work. Anyone with the same tools can just roll their own.
For my money it's by far the best Claude Code compliment.
Why recycle full history into every future turn until you run out of context window?
Perhaps letting agent manage its own context while knowing what an effective context and the harm or going over context or smartly making that tradeoff, it can navigate the tasks better?
---
link: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
On difference is that we have less control of the context to add/remove things per task necessary.
Let me stop you right there. Are you seriously talking about predictable when talking about a non-deterministic black box over which you have no control?