FRESH

Hacker News

Home

Show HN: Mysti – Claude, Codex, and Gemini debate your code, then synthesize

210 points by bahaAbunojaim

by d4rkp4ttern

7 subcomments

A workflow I find useful is to have multiple CLI agents running in different Tmux panes and have one consult/delegate to another using my Tmux-CLI [1] tool + skill. Advantage of this is that the agents’ work is fully visible and I can intervene as needed.
[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

by csar

1 subcomments

Getting feedback on a plan or implementation is valuable because you get a fresh set of eyes. Using multiple models may help though it always feels a bit silly to me (if nothing else you’re increasing non-determinism because you know have to understand 2 LLM’s quirks).
But the “playing house” approach of experts is somewhere between pointless and actively harmful. It was all the rage in June and I thought people abandoned that later in the summer.
If you want the model to eg review code instead of fixing things, or document code without suggesting improvements (for writing docs), that’s useful. But there’s. I need for all these personas.

by tombert

1 subcomments

I so want to like these vibe coding agents, and sometimes I do, but it really does kind of suck the joy out of things.
What I was hoping would be that I could effectively farm out work to my metaphorical AI intern while I get to focus on fun and/or interesting work. Sometimes that is what happens and it makes me very happy when it does. A lot of the time, however, it generates code that is wrong, or incomplete (while claiming it is complete), and so I end up having to babysit the code, either by further prompting or just editing the code.
And then it makes a lot of software engineering become "type prompt, sit and wait a minute, look at the code, repeat", which means I'm decidedly not focusing the fun part of the project and instead I'm just larping as a manager who backseat codes.
A friend of mine said that he likes to do this backwards: he writes a lot of the code himself and then he uses Claude Code to debug and automate writing tedious stuff like unit tests, and I think that might make it a little less mind numbing.
Also, very tangential, and maybe my prompting game isn't completely on point here, but Codex seems decidedly bad at concurrent code [1]. I was working on some lock-free data store stuff, and Codex really wanted to add a bunch of lock files that were wholly unnecessary. Oh, and it kept trying to add mutexes into Rust, no matter how many times I tell it I don't want locks and it should use one-shot channels instead. To be fair, when I went and fixed the functions myself in a few spots and then told it to use that as an example, it did get a little better.
[1] I think this particular case is because it's trained on example code from Github and most code involving concurrency uses locks (incorrectly or at least sub-optimally). I guess this particular problem may be more of the fault of American universities teaching concurrent programming incorrectly at the undergrad level.

by cheema33

3 subcomments

I created a simple skill in Claude Code CLI that collaborates with Codex CLI. It is just a prompt saved in the skill format. It uses subagents as well.
Honest question. How is Mysti better than a simple Claude skill that does the same work?

by mlrtime

3 subcomments

Why make it a vscode extension if the point of these 3 tools is a cli interface? Meaning most of the people I know use these tools without VSCode. Is VSC required?

by dwa3592

3 subcomments

>Claude Code (Anthropic), Codex (OpenAI), and Gemini (Google) have different training, different strengths, and different blind spots.
Do they?
There was a paper about HiveMind in LLMs. They all tend to produce similar outputs when they are asked open ended questions.

by danpalmer

1 subcomments

> Together they debate, challenge each other, and synthesize the best solution
Do they? How much better are multiple agents on your evals, and what sort of evals are you running? I've also research that suggests that more agents degrades the output after a point.

by spaceman_2020

5 subcomments

I’ve never seen a profession change so fast as coding right now

by nextaccountic

1 subcomments

> License: BSL 1.1, free for personal and educational use, converts to MIT in 2030 (would love input on this, does it make sense to just go MIT?)
I LOLd at that. Things in AI space become obsolete much faster. I'd say just go with GPL or AGPL if you don't want proprietary software to be built on top of your code

by tiku

6 subcomments

Anyone knows of something similar but for terminal?
Update:
I've already found a solution based on a comment, and modified it a bit.
Inside claude code i've made a new agent that uses the MCP gemini through https://github.com/raine/consult-llm-mcp. this seems to work!
Claude code:
Now let me launch the Gemini MCP specialist to build the backend monitoring server:
gemini-mcp-specialist(Build monitoring backend server) ⎿ Running PreToolUse hook…

by MrDunham

2 subcomments

Website link on Github points to https://deepmyst.com/
But actually hosted on https://www.deepmyst.com/ with no forwarding from the Apex domain to www so it looks like the website is down.
Otherwise excited to deep dive into this as this is a variant of how we do development and seems to work great when the AI fights each other.

by bahaAbunojaim

0 subcomment

UPDATE: Mysti 0.2.2 Release
Hey HN! Quick update on Mysti based on your feedback:
1- Mysti now supports GitHub Copilot CLI as a fourth provider. So you can now do Claude Code + Copilot (running GPT-5) in Brainstorm mode, or any combination of the 4 providers. Mix and match based on what catches different issues.
2- Mysti is now MIT Licensed. Switched from BSL 1.1 to MIT. 3- Better Auth UX When a CLI isn't authenticated, you now get a friendly error with one-click "Open Terminal & Authenticate" instead of cryptic CLI errors.

by thomas_witt

1 subcomments

Codex CLI can run as MCP server ootb which you can call directly from Claude code. Together with a prompt to ask codex for a second opinion, that works very well for me, especially in code reviews.

by tacone

1 subcomments

Interesting, I was trying to implement this using AGENTS.md and the runSubagent tool in vscode. Vscode has not yet the capability to invoke different models as subagent so I plan to fallback to instructing copilot to use copilot-cli and gemini-cli. (I am quite angry about copilot CLI offering only full blown models and not the -mini versions though)

by Tarrosion

1 subcomments

> Is multi-agent collaboration actually useful or am I just solving my own niche problem?
I often write with Claude, and at work we have Gemini code reviews on GitHub; definitely these two catch different things. I'd be excited to have them working together in parallel in a nice interface.
If our ops team gives this a thumbs-up security wise I'll be excited to try it out when back at work.

by scrame

1 subcomments

> Mysti — Built by DeepMyst Inc
links to: https://deepmyst.com/ Site 404's.
> Made with Mysti
Ringing endorsement.

by danielfalbo

2 subcomments

How do we measure this is any better than just using 1 good model?

by deepsummer

1 subcomments

Great idea. Whether brainstorm mode is actually useful is hard to say without trying it out, but it sounds like an interesting approach. Maybe it would be a good idea to try running a SWE benchmark with it.
Personally, I wouldn't use the personas. Some people like to try out different modes and slash commands and whatnot - but I am quite happy using the defaults and would rather (let it) write more code than tinker with settings or personas.

by danr4

3 subcomments

licensing with BSL when basically every month the AI world is changing is not a smart decision.

by DenisM

2 subcomments

Multi agent collaboration is quite likely the future. All agents have blind spots, collaboration is how they are offset.
You may want to study [1] - this is the latest thinking on agent collaboration from Google.
[1] https://www.linkedin.com/posts/shubhamsaboo_we-just-ran-the-...

by GajendraSahu23

1 subcomments

This looks great! As someone just starting their coding journey, would using multiple agents (Claude/Gemini) help in learning best practices, or is it better suited for experienced developers for refactoring?

by prashantsengar

1 subcomments

This is very useful! I frequently copy the response of one model and ask another to review it and I have seen really good results with that approach.
Can you also include Cursor CLI for the brainstorming? This would allow someone to unlock brainstorming with just one CLI since it allows to use multiple models.

by omarkoudsi

1 subcomments

I feel this is quite needed. I am beginner vibe coder and have already felt the need for this. I constantly shift back and forth.

by altmanaltman

1 subcomments

> Would love feedback on the brainstorm mode. Is multi-agent collaboration actually useful or am I just solving my own niche problem?
If it's solving even your own niche problem, it is actually useful though right? Kind of a "yes or yes" question.

by adiga1005

1 subcomments

I have been using it for some time and it getting better and better with time in many cases it’s giving better output than other tools the comparison is great feature too keep up the good work

by taf2

1 subcomments

For me when it’s front end I usually work with Claude and have codex review. Otherwise I just work with codex… Claude also if I’m being lazy and want a thing quickly

by sorokod

2 subcomments

Have you tried executing multiple agents on a single model with modified prompts and have them try to reach consensus?
That may solve the original problem of paying for three different models.

by justatdotin

1 subcomments

multi-agent collaboration on planning is definitely really valuable. I lean in to gemini's long context and have it set up as a long-term observer who I consult about overall direction, project philosophy, patterns in fail and success, and prioritisation. This gives a different perspective from which to assess other agents' plans.

by dunkmaster

1 subcomments

Any benchmarks? For example vs a single model?

by tomsmithtld

1 subcomments

the "full" mode where agents critique each other seems more interesting than quick synthesis. curious whether you've seen cases where the debate produces something neither model would've suggested alone?

by RobotToaster

1 subcomments

That sounds like it could get expensive?

by bahaAbunojaim

0 subcomment

UPDATE: License is now MIT! Super excited to see your contributions and feedback!

by ekropotin

1 subcomments

How it’s different from PAL MCP (ex ZEN MCP)?

by Alifatisk

1 subcomments

This reminds me a lot of eye2.ai, but outside of coding

by p1esk

1 subcomments

Why limit to 2 agents? I typically use all 3.

by matt3210

1 subcomments

For only 3x the cost

by NicoJuicy

1 subcomments

Sounds very similar to LLM council
https://github.com/karpathy/llm-council

by nickphx

1 subcomments

how would using multiple services that are incapable of performing the work correctly result in better work?