How would it compare?
What helped was treating agents less like “always-on brains” and more like short-lived executors. Each step had an explicit goal, explicit inputs, and a defined end. Once the step finished, the agent stopped and context was rebuilt deliberately.
Harnesses like this feel important because they shift the problem from “make the model smarter” to “make the system more predictable.” In our experience, reliability came more from reducing degrees of freedom than from adding intelligence.
[see https://news.ycombinator.com/item?id=45988611 for explanation]