OSS: The `engine` and `agent` crates are now fully open (Apache 2.0). only the `gateway` and the various plugins are FSL with a 3-year clock. You can run the full state machine locally, self-hosted, no cloud dependency. The end-to-end workflow works out of the box with ollama and any 13B+ model now.
Multi-agent support: I validated and enhanced plugins for Codex CLI, Oh-My-Codex, and Pi alongside the existing Claude Code plugin. Same gateway, same workflows, different agent frontends. the Pi plugin in particular is interesting because pi's extension API supports things Claude Code doesn't yet... programmatic model switching and per-state tool filtering where the model literally never sees disallowed tools.
Interrupts: Reactive file triggers that force state transitions. model edits a migration file? interrupt fires, pulls it into a review state. it uses the History State pattern to return where it was in the state machine after.
Fork/join: Parallel sub-agent execution. planning state dispatches N implementation branches each in their own worktree, join collects results.
Allowed_commands: Per-state bash command restrictions. testing state can run pytest but not rm -rf. enforced in the hook, not the prompt.
Tangentally, the Forge post this week (https://news.ycombinator.com/item?id=48192383) validated the same thesis from a different angle... structural guardrails on small models outperform unconstrained frontier models. three independent projects converging on "the harness is first-class infrastructure" in roughly two weeks.
Next on the agenda: per-state model routing. use a local 12B for grunt work, route to Opus/GPT-5 for the one call that matters. the cost math is trending towards ~80% reduction on a 6-phase workflow
The research page (https://statewright.ai/research) mentions a patent, and a "core engine";
> Provisional patent application filed: #64/054,240 (April 30, 2026). 35 claims covering state machine guardrail enforcement for LLM agent tool access. The core engine remains Apache 2.0 open source.
I'm not sure I understand what the "core engine" is if it's not the "state machine guardrail runtime" which is what the patent cover. What parts are the open source parts exactly?
I find the idea really interesting and was nodding along the way as I read what you wrote, makes sense both for the human and the agent, seems like a really nice idea that'd help, but the patent kind of makes me want to run away and not look into it too deeply.
In any case, I'll have to check out Statewright after work ;)
Second thought: enforcing tools is useful and I built myself a Pi extension to deny access to particular tools in some workflows.
But we need somehow to force agents obey the rules.
For example I have rules when using Pi to ask main agent to dispatch implementer agents in parallel using git worktrees. Some time it uses git worktrees, sometimes not.
The thoughts are like this: "the user asked me to use git worktrees so let me start using git worktrees. But wait, the task is simple so maybe I don't need git worktrees..."
If I ask why it didn't follow the rules, it says something like: "The user is right, I should have followed the rules..."
In your Github, the JSON format shown for defining custom workflows is very simple. I wonder if that limits the detail in the state-related instructions and error messages you can send to a model.
For example, in state transitions, does your tool just tell the model something like "you are in 'act' mode and no longer in 'plan' mode, here are your new available tools"? Seems difficult to give it any more informative messages given how simple the workflow definitions are. Likewise when the model attempts to do something that's not supported for tools in the given phase.
If a state machine can improve a local LLM to produce better results, it's welcome addition to tinkerers and solo devs.
what's the difference between a "transition" (purple line, not shown in the workflow) as opposed to happy path / failure?
Is the editor/composer separate from the runtime?
If I build a workflow in the visual editor, can I use that same flow inside my own app just by using the runtime/engine? Or is it mainly tied to the Statewright platform and Claude Code plugin?
I’m wondering if the runtime can be used as a standalone piece to power apps I build.
nocodo is one of my product experiments, currently using 120B model but I have tested a few agents inside it with 20B models.
I create a bunch of agents, each with very specific goals. Like Project Manager, Backend Engineer, etc.
Each agent gets a very compact list of tools and access to only certain parts of the filesystem or commands.
The plan/implement/test workflow is very basic and represents the most common agentic use case. But the state machine pattern applies to any multi-step work where agents are useful but susceptible to death spirals, hallucinations, or other non-deterministic quirkiness. This also enables Claude Desktop and other non-coding agents to perform useful constrained work.
I've been building a content pipeline for tabletop publishing and tested it a bit earlier yesterday. A research phase gathers lore and game details from a compendium, a drafting phase generates structured content including schema-specific JSON validation (so my Lua+LaTeX templates work without iterating). A review gate has me editing content directly (tmux+neovim dialog is great for this). The agent shapes the content, makes sure it conforms to JSON validation and content requirements, then I write it. Before I adapted the state machine to it, the agent tried to do everything all at once — calling multiple agents is sometimes effective but details get lost and you definitely lose visibility in the summarization. The state machine runs everyone serially (for now) but chaining and parallelization are on the roadmap.
While working with statewright on a different workflow over the weekend and Claude (as Claude does) attempted to write an intricate bash script to work around a guardrail... and statewright blocked it! I think that was when I knew there was some real power behind what's been built here. Enforcement has to be structural, not advisory.
Also, being generally useful for things besides coding you can start to think about things like SOC 2 change management. Every change needs a plan, a human review gate, audited implementation, pull request, review, human approval, and then finally a human to approve a production deployment. Today teams enforce this with checklists and hope. An agent constrained by a workflow that won't let it deploy without all the prerequisite pieces is enterprise delivery with an auditable paper trail and humans injected for approvals where they need to be - not managing each change's lifecycle.
The piece I'm most excited about is agent-generated workflows. You solve a problem once and maintain your context, then point the agent at the JSON schema and it creates and uploads a new workflow to statewright automatically that you can use immediately. No fine-tuning, no exhaustive prompt engineering, no dozens of agents... best-fit lightweight guardrails that agents help build themselves, compiling your intent into structure the models can't weasel their way out of. This is a fundamentally different reality than what the current state of the art is practicing. I think that's a big deal.