llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"
https://tools.simonwillison.net/svg-render#%3Csvg%20xmlns%3D...Pretty good for a 123B model!
(That said I'm not 100% certain I guessed the correct model ID, I asked Mistral here: https://x.com/simonw/status/1998435424847675429)
- Less than a year behind the SOTA, faster, and cheaper. I think Mistral is mounting a good recovery. I would not use it yet since it is not the best along any dimension that matters to me (I'm not EU-bound) but it is catching up. I think its closed source competitors are Haiku 4.5 and Gemini 3 Pro Fast (TBA) and whatever ridiculously-named light model OpenAI offers today (GPT 5.1 Codex Max Extra High Fast?)
by InsideOutSanta
2 subcomments
- I gave Devstral 2 in their CLI a shot and let it run over one of my smaller private projects, about 500 KB of code. I asked it to review the codebase, understand the application's functionality, identify issues, and fix them.
It spent about half an hour, correctly identified what the program did, found two small bugs, fixed them, made some minor improvements, and added two new, small but nice features.
It introduced one new bug, but then fixed it on the first try when I pointed it out.
The changes it made to the code were minimal and localized; unlike some more "creative" models, it didn't randomly rewrite stuff it didn't have to.
It's too early to form a conclusion, but so far, it's looking quite competent.
- So I tested the bigger model with my typical standard test queries which are not so tough, not so easy. They are also some that you wouldn't find extensive training data for. Finally, I already have used them to get answers from gpt-5.1, sonnet 4.5 and gemini 3 ....
Here is what I think about the bigger model: It sits between sonnet 4 and sonnet 4.5. Something like "sonnet 4.3". The response sped was pretty good.
Overall, I can see myself shifting to this for reguar day-to-day coding if they can offer this for copetitive pricing.
I'll still use sonnet 4.5 or gemini 3 for complex queries, but, for everything else code related, this seems to be pretty good.
Congrats Mistral. You most probably have caught up to the big guys. Not there yet exactly, but, not far now.
by embedding-shape
9 subcomments
- Look interesting, eager to play around with it! Devstral was a neat model when it released and one of the better ones to run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this, so gonna be interesting to see if Devstral 2 can replace it.
I'm a bit saddened by the name of the CLI tool, which to me implies the intended usage. "Vibe-coding" is a fun exercise to realize where models go wrong, but for professional work where you need tight control over the quality, you can obviously not vibe your way to excellency, hard reviews are required, so not "vibe coding" which is all about unreviewed code and just going with whatever the LLM outputs.
But regardless of that, it seems like everyone and their mother is aiming to fuel the vibe coding frenzy. But where are the professional tools, meant to be used for people who don't want to do vibe-coding, but be heavily assisted by LLMs? Something that is meant to augment the human intellect, not replace it? All the agents seem to focus on off-handing work to vibe-coding agents, while what I want is something even tighter integrated with my tools so I can continue delivering high quality code I know and control. Where are those tools? None of the existing coding agents apparently aim for this...
by pluralmonad
5 subcomments
- I'm sure I'm not the only one that thinks "Vibe CLI" sounds like an unserious tool. I use Claude Code a lot and little of it is what I would consider Vibe Coding.
by princehonest
3 subcomments
- Let's say you had a hardware budget of $5,000. What machine would you buy or build to run Devstral Small 2? The HuggingFace page claims it can run on a Mac with 32 GB of memory or an RTX 4090. What kind of tokens per second would you get on each? What about DGX Spark? What about RTX 5090 or Pro series? What about external GPUs on Oculink with a mini PC?
- I'm glad it's not another LLM CLI that uses React.
Vibe-cli seems to be built with https://github.com/textualize/textual/
- I'm so glad Mistral never sold out. We're really lucky to have them in the EU at the time when we're so focused on mil-tech etc.
- Just added it to our inventory. For those of you using Nix:
nix run github:numtide/llm-agents.nix#mistral-vibe
The repo is updated daily.
- 10x cheaper price per token than Claude, am I reading it right?
As long as it doesn't mean 10x worse performance, that's a good selling point.
by SyneRyder
1 subcomments
- I was briefly excited when Mistral Vibe launched and mentions "0 MCP Servers" in its startup screen... but I can't find how to configure any MCP servers. It doesn't respond to the /mcp command, and asking Devstral 2 for help, it thinks MCP is "Model Context Preservation". I'd really like to be able to run my local MCP tools that I wrote in Golang.
I'm team Anthropic with Claude Max & Claude Code, but I'm still excited to see Mistral trying this. Mistral has occasionally saved the day for me when Claude refused an innocuous request, and it's good to have alternatives... even if Mistral / Devstral seems to be far behind the quality of Claude.
- Ah, finally! I was checking just a few days ago if they had a Claude Code-like tool as I would much rather give money to a European effort. I'll stop my Pro subscription at Anthropic and switch over and test it out.
- This is great! I just made an AUR package for it: https://aur.archlinux.org/packages/mistral-vibe
by alexmorley
0 subcomment
- Does anyone know where their SWE-bench Verified results are from? I can't find matching results on the leaderboards for their models or the Claude models and they don't provide any links.
- The interesting bit in the blog isn’t the 72.2% SWE-Bench Verified number, it’s their own human eval: Devstral 2 beats DeepSeek V3.2 in Cline-style workflows but still loses clearly to Claude Sonnet 4.5. That’s a nice reminder that “open SOTA” on a single benchmark doesn’t mean “best tool for the job” once you’re doing multi-step edits across a messy real repo.
What is a big deal here is the combination of licensing and packaging. A 123B dense code model under a permissive license plus an open-source CLI agent (Vibe) that already speaks ACP is basically a reference stack for “bring your own infra + agents” instead of renting someone else’s SaaS IDE. If that ecosystem hardens (Cline, Kilo, Vibe, etc.), the moat shifts from “we have the only good code model” to “we own the best workflows and integrations”, and that’s a game open models can realistically win.
by mentalgear
0 subcomment
- Just tried it out via their free API and the Roo Code VSCode extension, and it's impressive. It walked through a data analytics and transformation problem (150.000 dataset entries) I have been debugging for the past 2 hours.
by joostdevries
0 subcomment
- Very nice that there's a coding cli finally. I have a Mistral Pro account. I hope that it will be included. It's the main reason to have a Pro account tbh.
- The system prompt and tool prompts for their open source (Apache 2 licensed) Python+Textual+Pydantic CLI tool are fun to read:
core/prompts/cli.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
core/prompts/compact.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/bash.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/grep.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/read_file.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/write_file.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/search_replace.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/todo.md
https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
by weitendorf
0 subcomment
- Open sourcing the TUI is pretty big news actually. Unless I missed something, I had to dig a bit to find it, but I think this is it: https://github.com/mistralai/mistral-vibe
Going to start hacking on this ASAP
- Really good stats! Sadly them being dense models will make them slow compared to something like Qwen3. Honestly my main reason not to use Mistral models when I need something on-prem with limited hardware (think Nvidia L4 GPUs).
- > Mistral Code is available with enterprise deployments.
> Contact our team to get started.
The competition is much smoother. Where are the subscriptions which would give users the coding agent and the chat for a flat fee and working out of the box?..
by badsectoracula
4 subcomments
- > Devstral 2 ships under a modified MIT license, while Devstral Small 2 uses Apache 2.0. Both are open-source and permissively licensed to accelerate distributed intelligence.
Uh, the "Modified MIT license" here[0] for Devstral 2 doesn't look particularly permissively licensed (or open-source):
> 2. You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. You may contact Mistral AI (sales@mistral.ai) to request a commercial license, which Mistral AI may grant you at its sole discretion, or choose to use the Model on Mistral AI's hosted services available at https://mistral.ai/.
[0] https://huggingface.co/mistralai/Devstral-2-123B-Instruct-25...
by syntaxing
1 subcomments
- Extremely happy with this release, the previous Devstral was great but training it for open hands crippled the usefulness. Having their own CLI dev tool will hopefully be better
by lgrapenthin
0 subcomment
- I tried this on a small Clojure codebase and asked it to write some tests. It couldn't get its parentheses balanced. After 10 attempts or so it tried to write a smaller test file first, but again failed.
Regardless of the parentheses, the test code it came up with was quite basic and arbitrary. It didn't try to come up with interesting edge cases or anything.
by moffkalast
0 subcomment
- Looks like another Deepseek distil like the new Ministrals. For every other use case that would be an insult, but for coding that's a great approach given how much lead in coding performance Qwen and Deepseek have on Mistral's internal datasets. The Small 24B seems to have a decent edge on 30BA3B, though it'll be comparatively extremely slow to run.
by therealmarv
1 subcomments
- offtopic but it hurts my eyes: I dislike for their font choice and their "cool looks" in their graphics.
Surprising and good is only: Everything including graphics fixed when clicking my "speedreader" button in Brave. So they are doing that "cool look" by CSS.
- I gave it the job of modifying a fairly simple regex replacement and it took a while over 5 minutes, claude failed on the same prompt (which surprised me), codex did a similar job but faster. So all in all not bad!
by da_grift_shift
1 subcomments
- Can Vibe CLI help me vibe code PRs for when I vibe on the https://github.com/buttplugio/buttplug repo?
by whimsicalism
0 subcomment
- > Model Size (B tokens)
How is that a measure of model size? It should either be parameter size, activated parameters, or cost per output token.
Looks like a typo because the models line up with reported param sizes.
- Think I found a bug? After an hour of light use on a small project, the TUI started to lag quite heavily and became less and less responsive over time.
- Finally, we can use a european model to replace claude code.
- Somehow it writes bad React code and misses to check linting prompts half the time. But surprisingly, the Python coding was great!
- will definetey try mistral vibe with gpt-oss-20b
- They offer an extension for Zed at launch, fantastic! Did not spot that when first skimming through the page.
- Wonder why Gemini 3 Pro and Sonnet 4.5 are on this comparison but Opus 4.5 is not?
- Let's see which company becomes the first to sell "coding appliances": hardware with a model good enough for normal coding.
If Mistral is so permissive they could be the first ones, provided that hardware is then fast/cheap/efficient enough to create a small box that can be placed in an office.
Maybe in 5 years.
- did anyone test how up to date is knowledge?
After querying the model about .NET, it seems that its knowledge comes from around June 2024.
- I am very disappointed they don't have an equivalent subscription for coding to the 200 EUR ChatGPT or Claude one, and it is only available for Enterprise deployments.
The only thing I found is a pay-as-you-go API, but I wonder if it is any good (and cost-effective) vs Claude et al.
- In a figure: Model size (B tokens)?
by justinclift
0 subcomment
- Interesting. It sounds like using local LLMs (via vllm, ollama, etc) with decent agentic capability might be starting to become a reality.
Next step, just need a shitload of vram. ;)
Maybe those Intel Battlematrix 48GB cards might be useful after all... :)
https://www.storagereview.com/review/intel-arc-pro-b60-battl...
by patrick4urcloud
0 subcomment
- their cli is awesome nicer than others :)
- Yet another CLI.
Why does every AI provider need to have its own tool, instead of contributing to existing tools like Roo Code or Opencode?
by a_state_full
0 subcomment
- [dead]
- Modified MIT?????
Just call it Mistral License & flush it down
- PSA: 10X savings when you have to prompt it 10 times to get the correct solution is not actually faster.