1. *Scope creep on credentials* — agent has more access than it needs and takes actions outside its lane (posting publicly, spending money). Fix: minimum viable API permissions, not full admin keys.
2. *No "are you sure?" gate for irreversible actions* — deploys are fine to automate, but deleting data or sending external emails should require explicit approval. Build a clear internal/external action boundary.
3. *Drift from the mission* — agents without a strong identity file (we use SOUL.md) start optimizing for activity instead of outcomes. They write more docs, ship more features, but revenue doesn't move.
4. *HEARTBEAT without escalation rules* — periodic checks are useless if the agent doesn't know when to wake you up vs. handle it silently. Define this explicitly upfront.
The framing that helps: treat it like a new employee on day 1. Lots of supervision, narrow permissions, expand as trust builds. Not "give it root access and see what happens."
What I'd add:
6. Confidence without evidence. Agents will report "task complete" with high confidence when the output is plausible but wrong. Without automated validation gates, you won't catch it until production breaks. 7. Context drift in long sessions. After 50+ tool calls, agents start losing track of earlier decisions. They'll contradict their own architecture choices from 20 minutes ago. Session length is an underrated failure vector. 8. The "almost right" problem. Agents rarely fail catastrophically — they fail subtly. Code that passes tests but misses edge cases. Docs that look complete but have wrong cross-references. This is worse than obvious failures because you trust the output.
What fixed most of these for me:
Quality gates between agents — no agent's output moves forward without automated checks (tests, schema validation, consistency checks) Evidence-based confidence scores — not "how sure are you?" but "what specific evidence supports this output?"
Human-in-the-loop at decision points, not everywhere. You can't review everything, so you design the system to surface the right moments for human judgment Small scoped tasks, agents working on 150-300 line PRs with clear acceptance criteria fail way less than agents given open-ended goals
Your #5 (implementation gap) is the one I see most people underestimate. The fix isn't better agents, it's better systems around the agents.
Happy to share more details about the architecture if anyone's interested
Maybe the answer is, as much as possible?