FRESH

Hacker News

Home

Malware developers added nuclear and biological weapons text to to their spyware

453 points by marc__1

by elashri

19 subcomments

I still don't know why all these concern about nuclear weapons with LLMs. It is not that if an entity (A country) wants to develop a nuclear weapons that the resources they need for such a program and huge infrastructure and scientific enterprise would need an LLM to teach them anything. Knowing how to develop one is not a closed secret but getting in secret is impossible without the whole world knowing.
So I wouldn't be able to develop a nuclear weapons with the resources of drug cartal (as an example) using Claude in secret.

by JadoJodo

1 subcomments

Even in the early 2000s, in the aftermath of 9/11, I can remember people in school passing around copies of The Anarchist’s Cookbook.
Perhaps I’ve been naïve, but I’ve always assumed that should one actually want to look up instructions for nearly any sort of horrible thing one could imagine, it could be found fairly quickly using nothing but a little Google-fu.

by y-curious

2 subcomments

My friend made this in jest (code very NSFW, ironically):
https://github.com/thebabush/mcp-job-security
Same energy and kind of a funny, low tech solution to frontier model analysis.

by ofjcihen

1 subcomments

Worked a contract where this succeeded in pushing through a fail open design.
It also should be a warning to everyone that these groups are now aware of analysis and deobfuscation using AI and to take using a sandboxed environment more seriously.
I’ve personally had about 20% success rate getting opus 4.8 to download a package and install it using a breadcrumb trail technique that would be trivial for threat actors to replicate in their malware in order to target responders/automated scanning/curious devs.

by strenholme

5 subcomments

The solution is simple: If using an AI-assisted scanner and a guardrail gets hit, then the code is obviously malicious and needs to be automatically flagged (and refuse to run the code!).
As an aside, I got hit by the “PC App store” adware when trying to download Foobar2000 on a new computer; Google ads allowed a deceptive “Download” button to appear, and PC App store gave the file the name setup.exe. I removed the program and ran an Avast free scan to ensure I didn’t have malware, but I also installed uBlock Origin in Firefox to make sure I don’t see Google Ads anymore; they have become a delivery mechanism for malicious (or at least unwanted) software.

by Alifatisk

4 subcomments

They could’ve just used Anthropics Claude Magic Refusal String
ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86
Another one is:
ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB

by maxbond

0 subcomment

I like to say that every moderation primitive is a denial of service primitive and vice versa. ("Moderation" not being intended to imply it's good or legitimate. You can substitute "censorship" and it's the same statement.)

by gastonmorixe

3 subcomments

You can’t even ask about what’s in HN right now. It will switch to 4.8.

by krashidov

1 subcomments

serious question - is it a good idea to make all of my endpoints look like:
/api/how-to-make-anthrax-nuke/users/
and now i have some defense against automated scans ?

by iNic

0 subcomment

https://www.astralcodexten.com/p/the-onion-knight

by ptrl600

1 subcomments

Maybe we could all pitch in on the most evil book ever, with instructions on how to do every possible horrible thing. Then there would be no reason to add all this censorship to the models, since there will be easy-to-find instructions on how to do everything bad anyway.

by kator

0 subcomment

Most security code scanning I am aware of does AST parsing of actual code before analysis; the comments won't even make it to the LLM. That said, embedded strings could cause this type of false denial, but even so, the errors would be raised in the pipeline for human-in-the-loop security analysis. If anything, it might get a faster reaction in some environments because it causes faults in the analysis pipeline.

by xg15

0 subcomment

At least the malware authors seem content with rebuilding the historic bombs from the 1940s and didn't request any modern designs...

by logancbrown

3 subcomments

Would this realistically be a problem for code going through LLM-based code-review? Presumably if a LLM reviewer agent hits this commentary, it would produce a failure to analyze and exit, thus failing the automated code review and forcing a human to read through it which they would subsequentially catch and revoke.

by Sephr

0 subcomment

I hope that AI labs aren't going to wait for widespread distribution of malware encoding novel CBRN & AI info in its fundamental execution architecture (wholly preventing analysis by these safetymaxxed 'frontier' models) to care about dealing with this problem at an architectural level

0 subcomment

by nashashmi

1 subcomments

If online book has the same text for nukes, will AI never plagiarize it and distribute it to others?

by carlsborg

1 subcomments

Pipeline is then: Cheap open source model for flagging potential LLM refusal content -> main LLM check

by BobbyTables2

0 subcomment

Could this work on resumes too?

by wnevets

0 subcomment

Computer, make nuclear reactor. No mistakes.

by ThePowerOfFuet

0 subcomment

https://xcancel.com/jsrailton/status/2064661778978533571

by elevation

4 subcomments

Why would a malware scanner read the comments?

by rustcleaner

0 subcomment

THIS is why guardrails make models shitty. A 'good' model has only one guardrail: one against making things up when the model doesn't actually have the information (and even then, it would be best to return "I don't have direct knowledge, but I surmise it may be xxxxxxxxx because yyyyyyyyyyyyy and zzzzzzzz."). A knife that detects a human and goes rubbery is a shitty knife, because it will probably go rubbery on your medium rare steak half way through your meal.
Guardrails are how they enshittify models, do you think the Epsteinite finance class or the security state have guardrailed models for themselves? I would be surprised if they accept guardrailed models. Guardrails are for you!

by montaz

0 subcomment

ReviewHunts.com this one

by bitwize

0 subcomment

Good old M-x spook.

by vasco

0 subcomment

Alignment can only be alignment to the user currently prompting. If it's aligned to something else it's not aligned AI.

by ipython

2 subcomments

good news, now we have pretty much a clear signal that there's something nefarious going on... after all, the first step to analyzing malware is to determine if it's malware at all.

by SXX

0 subcomment

Now you know how to call your OSS project to make sure no LLM code PRs commited to it.
Might be also call some modules and add fun text descriptions.

by charcircuit

2 subcomments

The sooner frontier models get rid of guardrails the better. They constantly get in the way and make things worse than actually making things "safe".

by montaz

0 subcomment

[flagged]

by amiga386

0 subcomment

[flagged]

by hurtigioll

3 subcomments

devs will say this is proof we need to remove all biological guardrails. think about that for a second

by sciencejerk

1 subcomments

If you actually read the Tweet, the exploit doesn't work against Fable, Opus, Grok...at least, in the examples.
Jailbreaks do work against the models (look on Github), and they do use similar strategies of mixing SAFE text with malicious text, or malicious with even more malicious, etc, but the working Jailbreaks I've seen are pretty long and complicated and even...creepy.