- They used SO many words to say so little thing. At this point it seems pretty safe to say Mythos is purely PR stunt.
- What does this mean?
> It's a different kind of tool doing a different kind of work, and that makes a clean apples-to-apples comparison to earlier models difficult.
They claim it’s a different kind of tool and then describe using it the same way you’d use any other model. This really felt way worse than the average Cloudflare blog and really just rehashed the Mythos announcement which had already called out the key parts being chaining and crafting examples.
- 'Narrow scope produces better findings - Telling the model "Find vulnerabilities in this repository" makes it wander. Telling it "Look for command injection in this specific function, with this trust boundary above it, here's the architecture document and here's prior coverage of this area" makes it do something much closer to what a researcher would actually do.'
So what, we take every function and every vulnerability type and just run the agents millions of times?
I would expect Mythos to be able to find vulnerabilities without pointing it out for him, otherwise it's no better from other agents. It's just has a better harness.
by sandeepkd
1 subcomments
- I was expecting some more concrete numbers and surprises. It just seems like a balanced promotion article probably written using LLM itself.
- The real question is whether it was Mythos or Opus that wrote this post.
> "Why it matters"
It doesn't, it's a corporate blog, they were rarely written in one-author's voice anyway, but it's interesting to see that even large organisations are outsourcing their blogs to LLMs.
by robot_jesus
0 subcomment
- The "Four lessons" that came out of running this work at scale made me chuckle. Three of the four were essentially identical and entirely obvious. In short: specific, narrow requests work better than "find vulnerabilities." Well, d'uh.
But, I did think the adversarial review (while not novel at all and talked about much in HN circles) is interesting and distinct, at least. I need to put this to work in more of workflows. I think it could be beneficial for non-coding tasks, too.
https://blog.cloudflare.com/cyber-frontier-models/#what-a-ha...
by MattSayar
2 subcomments
- > The loudest reaction to Mythos Preview from other security leaders has been about speed - scan faster, patch faster, compress the response cycle. More than one team we have spoken with is now operating under a two-hour SLA from CVE release to patch in production [...] If regression testing takes a day, you cannot get to a two-hour SLA without skipping it, and the bugs you ship when you skip regression testing tend to be worse than the bugs you were trying to patch.
Over time, I wonder if these models will be able to generate more secure code by default by doing this kind of exploitability testing before ever merging their code.
- That's great and all but how severe were the most severe vulnerabilities found? I imagine they don't want to talk about it, but that's really the most interesting and important bit.
- "Why it matters"
Kringe sloppy AI writing.
by sf_tristanb
1 subcomments
- great, but why don't you share real data on how many security vuln it found ? how many were reals, how many weren't ?
- > What changed with Mythos Preview is that a model can now take those low-severity bugs (which would traditionally sit invisible in a backlog) and chain them into a single, more severe exploit.
I think this statement seems to align with some of the other independent tests of Mythos[1]. It did very well on long agentic work which I expect is what they trained it for, and that requires being able to find these tangential links between loosely related topics in the context window.
[1] I'm mainly referring to https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...
- This is worth a read specifically for this section and the ones following it, re: custom vs. agentic-coding harnesses. https://blog.cloudflare.com/cyber-frontier-models/#why-point...
Claude Code's harness is remarkable for many use cases, particularly with 1M context sizes. But it's also limited when the scale of code or data to read becomes close to that, or exceeds it. The idea that a cluster of actors can work on a shared, structured set of context snippets, and have guidance around what is relevant to them, is an incredibly useful model outside of cybersecurity as well.
- The pushback is quite funny. I have found, in my own usage, that I had to evidence my legitimate access to the codebase before it would proceed.
by miraculixx
0 subcomment
- Did they compare it to other models? A lot of this sounds like this is the first time they have applied AI to security, and they are just amazed at the unreasonable performance of a pattern matching machine. Well, it matches patterns. duh
by jerrythegerbil
0 subcomment
- This blog was written by AI.
by k33P1Tr3aL
0 subcomment
- well how many CVE vulns did it find?
- I don't understand why Cloudflare got unrestricted access while Daniel Stenberg got Mythos run by a third party on cURL and only got a report. Well, I understand, but I may be wrong.
by staticassertion
0 subcomment
- > The harder question is what the architecture around the vulnerability should look like. The principle is to make exploitation harder for an attacker even when a bug exists, so that the gap between when a vulnerability is disclosed and when it is patched matters less. That means defenses that sit in front of the application and block the bug from being reached. It means designing the application so that a flaw in one part of the code cannot give an attacker access to other parts. It means being able to roll out a fix to every place the code is running at the same moment, rather than waiting on individual teams to deploy it.
So nothing new then.
- Beside the poorly written post, the vulnerability discovery workflow might actually give good results
by perching_aix
0 subcomment
- It's nice to see them address the instrumentation side of this.
I expressed some concerns along the same lines in the thread about the Mythos evaluation curl did a few days ago, which sounded a lot like the "passing in the repo and telling it go!" type workflow described in this as dramatically less effective.
Disappointed that the post is very slim on details beyond this however. No hard numbers. Not comparatively, not in isolation. Would have arguably been kinda the point.
- I can't wait to be told that Cloudflare is now part of "The Mythos FUD" campaign.
by unethical_ban
1 subcomments
- Interesting for teams looking to implement ai into their deployment process.
I don't think guardrails are useful long term. Assuming we don't see the end of open near-frontier models, it is folly to try to keep models from doing exploit generation. The solution needs to be all software projects writing code under the assumption that hackers will be running LLMs against their code in search of exploits and write secure code accordingly.
- > we tried letting the model write its own patches and watched a few go out that fixed the original bug while quietly breaking something else the code depended on.
This is something I've been anticipating. Imagine this happening on a 500k+ line project scattered across 10+ repos.
It would be easier and cheaper to pay me to rewrite the whole thing from scratch than to fix all the vulnerabilities.
- “Sorry Dave I’m afraid I can’t do that“
I’m a security researcher
“Oh in that case”
- [dead]
by getoffside
0 subcomment
- [dead]
- Technically speaking CloudFlare is at its core, a security vulnerability itself. World's largest MITM
by reducesuffering
0 subcomment
- There will be no mea culpa from folks insinuating Mythos is a marketing stunt. Nor will there be every time AI capabilities repeatedly blast through the naive expectations.