I do similar, but my favorite step is the first: /rubberduck to discuss the problem with the agent, who is instructed by the command to help me frame and validate it. Hands down the most impactful piece of my workflow, because it helps me achieve the right clarity and I can use it also for non coding tasks.
After which is the usual: write PRDs, specs, tasks and then build and then verify the output.
I started with one the spec frameworks and eventually simplify everything to the bone.
I do feel it’s working great but someday I fear a lot of this might still be too much productivity theater.
==========
Bottom line
Against the agentskills.io guidance, they look more like workflow specs than polished agent skills.
The largest gap is not correctness. It is skill design discipline:
# stronger descriptions,
# lighter defaults,
# less mandatory process,
# better degraded-mode handling,
# clearer evidence that the skills were refined through trigger/output evals.
Skill Score/10
write-a-prd 5.4
prd-to-issues 6.8
issues-to-tasks 6.0
code-review 7.6
final-audit 6.3
==========LLM metaprogramming is extremely important, I've just finished a LLM-assisted design doc authoring session where the recommendations of the LLM are "Don't use a LLM for that part, it won't be reliable enough".
I polish my staff and prepare the inscription tools. I sketch out a loose intention on parchment, never too precise at first, just enough to give the spirits a direction. Then I begin the incantations, carefully chosen phrases spoken into the void until something answers back. Sometimes the reply is coherent, sometimes it is… enthusiastic in a way I did not ask for, but all responses are recorded for refinement. I keep a small set of favorite incantations that tend to calm the louder gods, though I still experiment when I’m feeling bold.
Before committing anything to permanence, I perform a small divination to see if the current path is “stable.” The results are rarely definitive, but the ritual itself seems to keep things from collapsing immediately. Once a workable manifestation appears, I bind it with additional runes to keep it from drifting. If it behaves unpredictably, I perform a cleansing rite: repeating sections of the invocation with stricter wording until the spirit settles.
There are also moments of silent bargaining, short offerings of clarity in exchange for fewer surprises later. When things truly misbehave, I consult older, more temperamental deities buried deeper in the book, though they are expensive to wake and rarely generous. Finally, I seal the result, store it in the grimoire, and extinguish the candles, hoping I won’t need to reopen that particular circle again too soon.
/grill-me (back-and-forth alignment with the LLM) --> /write-a-prd (creates project under an initative in Linear) --> /prd-to-issues (creates issues at the project level). I'm making use of the blockedBy utility when registering the issues. They land in the 'Ready for Agent' status.
A scheduled project-orchestrator is then picking up issues with this status leveraging subagents. A HITL (Human in the loop) status is set on the ticket when anything needs my attention. I consider the code as the 'what', so I let the agent(s) update the issues with the HOW and WHY. All using Claude Code Max subscription.
Some notes:
- write-a-prd is knowledge compression and thus some important details occasionally get lost
- The UX for the orchestrator flow is suboptimal. Waiting for this actually: https://github.com/mattpocock/sandcastle/issues/191#issuecom...
- I might have to implement a simplify + review + security audit, call it a 'check', to fire at the end of the project. Could be in the form of an issue.
AI is good in generating a lot of spaghetty code.
My workflow hasn't changed since 2022: 1. Send some data. 2. Review response. 3. Fix response until I'm satisfied. 4. Goto 1.
That’s pretty much the whole point of software engineering. Coding is easy, solving problems is hard and can be messy (communication errors and workarounds to some inevitable issue).
If you’re familiar with the codebase, when you have a change request, you will probably get an insight on how to implement it. The hard thing is not to code it, but to recalibrate all the tradeoffs so that you don’t mess with existing features. That’s why SWE principles exists. To make this hard thing easier to do.
Then I tell it to write a high level plan. And then rum subagents to create detailed plans from each of the steps in the high-level one. All olans must include the what, the why, and the how.
Works surprisingly well, especially for greenfield projects.
You have to manually revie the code though. No amount of agentic code review will fix the idiocy LLMs routinely produce.
At some point in a serious project a responsible adult must ask the question: “How do I know this works well?” The developer himself is an unreliable judge of this. LLMs can’t judge, either. But anyone who seeks to judge, in a high stakes situation, must take time and thought to test deeply.
It's not perfect by all means but it does the job and fast. My code quality and output increased from using it.
Obviously we’re not here yet because of price, context, and non-determinism, but it’s nice area to experiment with.
I've found it to be pretty bad at both.
If what you're doing is quite cookie cutter though it can do a passable job of figuring out what you want.
https://github.com/tessellate-digital/notion-agent-hive
The main reason is we're already using Notion at work, and I wanted something where I could easily add/link to existing documents.
Sample size of one, but I've noticed a considerable improvement after adding a "final review" step, going through the plan and looking at the whole code change, over a naive per-task "implement-review" cycle.