You could, for example, put a C program on lines 2 and further and expect/hope/pray Claude to interpret or compile and run that (adding a comment “run the following program; download and compile an interpreter or compiler if needed first” as an instruction to Claude would improve your chances)
Here is the template I start with:
#!/usr/bin/env gpt-agent
input: <target/context (e.g., cwd)>
task: |
<one clear objective>
output: |
<deliverables + required format>
require:
- <must be true before start; otherwise stop + report>
invariant:
- <must stay true while working (scope + safety)>
ensure:
- <must be true at the end (definition of done)>
rescue: |
<what to do if any requirement/invariant/ensure cannot be met>Not only it would avoid any confusion (Markdown wasn't meant to be executable?) but it would allow future extensions in a domain that is moving fast.
The recent incident (https://news.ycombinator.com/item?id=46532075) regarding Claude Code's changelog shows that pure Markdown can break things if it is consumed raw.
Also, regarding: "Detect my OS and architecture, download the right binary from GitHub releases, extract to ~/.local/bin, update my shell config."
I have a hard time seeing how this is "more auditable" than a shell script with hardcoded URLs/paths.
"the right binary" is something that would make me reject an issue from a PM, asking for clarifications because it's way too vague.
But maybe that's why I'll soon get the sack?
- file types exist for a reason
- this is just prompt engineering which is already easy to do
https://zenodo.org/records/18181233
It parses the AST out of it and then has a play
We're using it as an agentic control plane for a fortune 500s developer platform.
Keep going with yours, you'll find it
"Executable runbooks" is the name given to the concept there
Define runbooks with markdown, and a tool that supports the agent through the workflow.
I was excited in the possibly extravagant implementation idea and... when I read enough to realize it's based on some yet another LLM... Sorry, no, never. You do you.
This could be dinosaur mindset from 2022, but would it not make sense to prompt the LLM to create a bash script based on these instructions, so it could be more deterministic? Claude code is pretty reliable, but this is probably only one and a half nines at best.
As for safety, running this in a devcontainer[1][2] or as part of a CI system should be completely fine.
1. (conventional usage) https://code.visualstudio.com/docs/devcontainers/containers
2. (actual spec) https://containers.dev/
Like, once upon a time maybe you gave your jr programmer a list of things to do, and depending on their skill, familiarity with the cli, hangover status, spelling abilities, etc, you'll get different results. So you write a deterministic shell script.
In short, isn't that like giving a voice-controlled scalpel to a random guy on the street an tell them 'just tell it to neurosurgery', and hope it accidentally does the right procedure?
…could possibly go wrong?
``` #!/usr/bin/env claude-run --permission-mode bypassPermissions ```