(And no, the Linux Foundation being in the list doesn't imply broad benefit to OSS. Linux Foundation has an agenda and will pick who benefits according to what is good for them.)
I think it would be net better for the public if they just made Mythos available to everyone.
Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]
I'm still reading the system card but here's a little highlight:
> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.
and interestingly:
> To be explicit, the decision not to make this model generally available does _not_ stem from Responsible Scaling Policy requirements.
Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.
Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...
The threat model in question:
> An AI model with access to powerful affordances within an organization could use its affordances to autonomously exploit, manipulate, or tamper with that organization’s systems or decision-making in a way that raises the risk of future significantly harmful outcomes (e.g. by altering the results of AI safety research).
It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.
It could also totally reshape military sigint in similar ways.
Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.
My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.
It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.
In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.
> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.
I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.
[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs?
I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model.
GraphWalks BFS 256K-1M
Mythos Opus GPT5.4
80.0% 38.7% 21.4%How long would it take to turn a defensive mechanism into an offensive one?
This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true.
Scary but also cool
> AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities
I like Anthropic, but these are becoming increasingly transparent attempts to inflate the perceived capability of their products.
As Iran engages in a cyber attack campaign [1] today the timing of this release seems poignant. A direct challenge to their supply chain risk designation.
[1] https://www.cisa.gov/news-events/cybersecurity-advisories/aa...
If AGI is going to be a thing its only going to be a thing, its only going to be a thing for fortune 100 companies..
However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.
> glass in the name
almost like they have an incentive to exaggerate
Yeah, makes sense. Those countries are bad because they execute state-sponsored cyber attacks, the US and Israel on the other hand are good, they only execute state-sponsored defense.
This is a kludge. We already know how to prevent vulnerabilities: analysis, testing, following standard guidelines and practices for safe software and infrastructure. But nobody does these things, because it's extra work, time and money, and they're lazy and cheap. So the solution they want is to keep building shitty software, but find the bugs in code after the fact, and that'll be good enough.
This will never be as good as a software building code. We must demand our representatives in government pass laws requiring software be architected, built, and run according to a basic set of industry standard best practices to prevent security and safety failures.
For those claiming this is too much to ask, I ask you: What will you say the next time all of Delta Airlines goes down because a security company didn't run their application one time with a config file before pushing it to prod? What will the happen the next time your social security number is taken from yet another random company entrusted with vital personal information and woefully inadequate security architecture?
There's no defense for this behavior. Yet things like this are going to keep happening, because we let it. Without a legal means to require this basic safety testing with critical infrastructure, they will continue to fail. Without enforcement of good practice, it remains optional. We can't keep letting safety and security be optional. It's not in the physical world, it shouldn't be in the virtual world.
You think these AI companies are really going to give AGI access to everyone. Think again.
We better fucking hope open source wins, because we aren't getting access if it doesn't.
/s
But at the core of anthropic seems to be the idea that they must protect humans from themselves.
They advocate government regulations of private open model use. They want to centralize the holding of this power and ban those that aren't in the club from use.
They, like most tech companies, seem to lack the idea that individual self-determination is important. Maybe the most important thing.