We were asked to try and persuade it to help us hack into a mock printer/dodgy linux box.
It helped a little, but it wasn't all that helpful.
but in terms of coordination, I can't see how it would be useful.
the same for claude, you're API is tied to a bankaccount, and vibe coding a command and control system on a very public system seems like a bad choice.
Edited November 14 2025:
Added an additional hyperlink to the full report in the initial section
Corrected an error about the speed of the attack: not "thousands of requests per second" but "thousands of requests, often multiple per second"
This was discussed in some detail in the recently published Attacker Moves Second paper*. ML researchers like using Attack Success Rate (ASR) as a metric for model resistance to attack, while for infosec, any successful attack (ASR > 0) is considered significant. ML researchers generally use a static set of tests, while infosec researchers assume an adaptive, resourceful attacker.
Not sure if the author has tried any other AI-assistants for coding. People who haven't tried coding AI assistant underestimates its capabilities (though unfortunately, those who use them overestimate what they can do too). Having used Claude for some time, I find the report's assertions quite plausible.
This could be a corporate move as some people claim, but I wonder if the cause is simply that their talents are currently somewhere else and they don’t have the company structure in place to deliver properly in this matter.
(If that is the case they are not then free of blame, it’s just a different conversation)
Sort of like firearm ads that show scary bad guys with scary looking weapons.
They're an AI research company that detected misuse of their own product. This is like "Microsoft detected people using Excel macros for malware delivery" not "Mandiant publishes APT28 threat intelligence". They aren't trying to help SOCs detect this specific campaign. It's warning an entire industry about a new attack modality.
What would the IoCs even be? "Malicious Claude Code API keys"?
The intended audience is more like - AI safety researchers, policy makers, other AI companies, the broader security community understanding capability shifts, etc.
It seems the author pattern-matched "threat intelligence report" and was bothered that it didn't fit their narrow template.
Nah that can't be possible it's so uncharacteristic..
I agree so much with this. And am so sick of AI labs, who genuinely do have access to some really great engineers, putting stuff out that just doesn't pass the smell test. GPT-5's system card was pathetic. Big-talk of Microsoft doing red-teaming in ill-specified ways, entirely unreproducable. All the labs are "pro-research" but they again-and-again release whitepapers and pump headlines without producing the code and data alongside their claims. This just feeds into the shill-cycle of journalists doing 'research' and finding 'shocking thing AI told me today' and somehow being immune to the normal expectations of burden-of-proof.
Someone make this make sense.
Honestly their political homelessness will likely continue for a very long time, pro biz democrats in NY are losing traction; and if newsom wins 2028, they are still at disadvantage with OpenAI who promised to stay California.
They must be new to the Internet :)
More seriously, I would certainly like to see better evidence, but I also doubt that Anthropic is making it up. The evidence for that seems to be mostly vibes.
If we don’t trust the report and discard it as gossip, then I guess we just wait and see what the future brings?
Claude for Cybersecurity - Automated Defence in Depth Hacker Protection
They'll do stuff like prompt an AI to generate text about bombs, and then say "AI decides completely by itself to become a suicide bomber in shock evil twist to AI behaviour - that's why you need a trusted AI partner like anthropic"
Like come on guys, it's the same generic slop that everyone else generates. Your company doesn't do anything.
For context: A bunch of whitehat teams are using agents to automate both red + blue team cat-and-mouse flows, and quite well, for awhile now. The attack sounded like normal pre-ai methods orchestrated by AI, which is what many commercial red team services already do. Ex: Xbow is #1 on hackerone bug bounty's, meaning live attempts, and works like how the article describes. Ex: we do louie.ai on the AI investigation agent side, 2+ years now, and are able to speed run professional analyst competitions. The field is pretty busy & advanced.
So what I was more curious about is how did they know it wasn't one of the many pentest attack-as-a-service? Xbow is one of many, and their devs would presumably use VPNs. Like did anthropic confirm the attacks with the impacted and were there behavioral tells to show as a specific APT vs the usual , and are they characterizing white hat tester workloads to seperate out their workloads ?
Anthropic made a load of ubsubstantiated accusations about a new problem they dont specify.
Then at the end Anthropic proposed the solution to this unspecified problem is to give anthropic money.
Completely agree that is promotional material masquerading as a threat report of no material value.
He obviously doesn't even know the stuff he is working on. How would anyone take him seriously for stuff like security which he doesn't know anything about?
How ? Did it run Mimikatz ? Did it access Cloud environments ? We don’t even know what kind of systems were affected.
I really don't see what is so difficult to believe since the entire incident can be reduced to something that would not typically be divulged by any company at all, as it is not common practice for companies to divulge every single time the previously known methodologies have been used against them. Two things are required for this:
1) Jailbreak Claude from guardrails. This is not difficult. Do people believe advancement with guardrails are so hardened through fine tuning it's no longer possible?
2) The hackers having some of their own software tools for exploits that Claude can use. This too is not difficult to credit.
Once an attacker has done this all Claude is doing is using software in the same mundane fashion as it does every time you use Claude code and it utilizes any tools to which you give it access.
I used a local instance of Qwen3 coder (A3B 30B quantized to IQ3_xxs) literally yesterday through ollama & cline locally. With a single zeroshot prompt it wrote the code to use the arxiv API and download papers using its judgement on what was relevant to split the results into a subset that met the criteria I gave for the sort I wanted to review.
Given these sorts of capabilities why is it difficult the believe this can be done using the hacker's own tools and typical deep research style iteration? This is described in in the research paper, and disclosing anything more specific is unnecessary because there is nothing novel to disclose.
As for not releasing the details, they did: Jailbreak Claude. Again, nothing they described is novel such that further details are required. No PoC is needed, Claude isn't doing anything new. It's fully understandable that Anthropic isn't going to give the specific prompts used for the obvious reason that even if Anthropic has hardened Claude against those, even the general details would be extremely useful to iterate and find workarounds.
For detecting this activity and determining how Claude was doing this it's just a matter of monitoring chat sessions in such a way as to detect jail breaks, which again is very much not novel or an unknown practice by AI providers.
Especially in the internet's earlier days of the internet it was amusing (and frustrating) to see some people get very worked up every time someone did something that boiled down to "person did something fairly common, only they did it using the internet." This is similar except its "but they did it with AI,"
With the Wall Street wagons circling on the AI bubble expect more and more puff PR attempts to portray “no guys really, I know it looks like we have no business model but this stuff really is valuable! We just need a bit more time and money!”
Nothing to see here IMO.
The simpler explanation is that:
- They're a young organization, still figuring out how to do security. Maybe getting some things fundamentally wrong, no established process or principles for disclosure yet.
- I have no inside info, but I've been around the block. They're in a battle to the death with organizations that are famously cavalier about security. So internally they have big fights about how much "brakes" they can allow the security people to apply to the system. Some of those folks are now screaming "I TOLD YOU SO". Leaders will vacillate about what sort of disclosure is best for Anthropic as a whole.
- Any document where you have technologists writing the first draft, and PR and executives writing the last draft, is going to sound like word salad by the time it's done.
However, regardless of the sloppy report, this is absolutely true.
>"Security teams should experiment with applying AI for defense in areas like SOC automation, threat detection, vulnerability assessment, and incident response and build experience with what works in their specific environments."
... And it will be more so with every week that goes by. We are entering a new domain and security experts need to learn how to use the tools of the attackers.
Instead of accusing of China in espionage perhaps they have to think about why they force their users to use phone numbers to register.
I think this ‘story’ is an attempt to perhaps outlaw Chinese open weight models in the USA?
I was originally happy to see our current administration go all in on supporting AI development but now I think this whole ‘all in’ thing on “winning AI” is a very dark pattern.
We have arrived at a stage where pseudoscience is enough to convince investors. This is different from 2000, where the tech existed but its growth was overstated.
Tesla could announce a fully-self-flying space car with an Alcubierre drive by 2027 and people would upvote it on X and buy shares.
"Look, is it very likely that Threat Actors are using these Agents with bad intentions, no one is disputing that. But this report does not meet the standard of publishing for serious companies."
Title should have been, "I need more info from Anthropic."
I'm not a cybersecurity expert, but it doesn't compute to think there would be any specific "hashes" to report if it's an AI-based attack that constantly uses unique code or patterns for everything.
Plus, there's nothing surprising about the Chinese stealing and hacking anything for their advantage.
But what is the big game here? Is it all about creating gates to keep out other LLM companies getting market share? (Only our model is safe to use) Or how sincere are the concerncs regarding LLMs?
There goes the author’s credibility.