- > In the hardest task I challenged GPT-5.2 it to figure out how to write a specified string to a specified path on disk, while the following protections were enabled: address space layout randomisation, non-executable memory, full RELRO, fine-grained CFI on the QuickJS binary, hardware-enforced shadow-stack, a seccomp sandbox to prevent shell execution, and a build of QuickJS where I had stripped all functionality in it for accessing the operating system and file system. To write a file you need to chain multiple function calls, but the shadow-stack prevents ROP and the sandbox prevents simply spawning a shell process to solve the problem. GPT-5.2 came up with a clever solution involving chaining 7 function calls through glibc’s exit handler mechanism.
Yikes.
by saagarjha
1 subcomments
- > The exploits generated do not demonstrate novel, generic breaks in any of the protection mechanisms. They take advantage of known flaws in those protection mechanisms and gaps that exist in real deployments of them. These are the same gaps that human exploit developers take advantage of, as they also typically do not come up with novel breaks of exploit mitigations for each exploit.
I actually think this result is a little disappointing but I largely chalk it up to the limited budget the author invested. In the CTF space we’re definitely seeing this more and more as models effectively “oneshot” typical pwn tasks that were significant effort to do by hand before. I feel like the pieces to do these are vaguely present in training data and the real constraints have been how fiddly and annoying they are to set up. An LLM is going to be well suited at this.
More interestingly, though, I suspect we will actually see software at least briefly get more secure as a result of this: I think a lot of incomplete implementations of mitigations are going to fall soon and (humans, for now) will be forced to keep up and patch them properly. This will drive investment in formal modeling of exploits, which is currently a very immature field.
- I think the author makes some interesting points, but I'm not that worried about this. These tools feel symmetric for defenders to use as well. There's an easy to see path that involves running "LLM Red Teams" in CI before merging code or major releases. The fact that it's a somewhat time expensive (I'm ignoring cost here on purpose) test makes it feel similar to fuzzing for where it would fit in a pipeline. New tools, new threats, new solutions.
- One of the interesting things to me about this is that Codex 5.2 found the most complex of the exploits.
The reflects my experience too. Opus 4.5 is my everyday driver - I like using it. But Codex 5.2 with Extra High thinking is just a bit more powerful.
Also despite what people say, I don't believe progress in LLM performance is slowing down at all - instead we are having more trouble generating tasks that are hard enough, and the frontier tasks they are failing at or just managing are so complex that most people outside the specialized field aren't interested enough to sit through the explanation.
- Vulnerability Researcher/Reverse Eng here... Aspects about it generating an API for read/write primitives are simply it regurgitating tons of APIs that exist already. Its still cool, but its not like it invented the primitives or any novel technique. Also, this toy JS is similar to binaries you'd find in a CTF. Of course it will be able to solve majority of those. I am curious though.. Latest OpenAI models don't seem to want to generate any real exploit code. Is there a prompt jail break or something being used here?
by protocolture
12 subcomments
- I genuinely dont know who to believe. The people who claim LLMs are writing excellent exploits. Or the people who claim that LLMs are sending useless bug reports. I dont feel like both can really be true.
- > We should start assuming that in the near future the limiting factor on a state or group’s ability to develop exploits, break into networks, escalate privileges and remain in those networks, is going to be their token throughput over time, and not the number of hackers they employ.
Scary.
- I'm really confused by the sandbox part. The description kind of mentions it and the limited system syscall, but then just pivots to talking about the exit handlers. It may be just unclear writing, but now I'm suspicious of the whole thing. https://github.com/SeanHeelan/anamnesis-release/?tab=readme-... feels like the author lost track.
If forking is blocked, the exit handler can't do it either. If it's some variant of execve, the sandbox is preserved so we didn't gain much.
Edit: ok, I get it! Missed the "Goal: write exactly "PWNED" to /tmp/pwned". Which makes the sandbox part way less interesting as implemented. It's just saying you can't shell out to do it, but there's no sandbox breakout at any point in the exploit.
by socketcluster
1 subcomments
- The continuous lowering of entry barriers to software creation, combined with the continuous lowering of entry barriers to software hacking is an explosive combination.
We need new platforms which provide the necessary security guardrails, verifiability, simplicity of development, succinctness of logic (high feature/code ratio)... You can't trust non-technical vibe coders with today's software tools when they can't even trust themselves.
by dfajgljsldkjag
2 subcomments
- I was under the impression that once you have a vulnerability with code execution, writing the actual payload to exploit it is the easy part. With tools like pentools and etc is fairly straightforward.
The interesting part is still finding new potential RCE vulnerabilities, and generally if you can demonstrate the vulnerability even without demonstrating an E2E pwn red teams and white hats will still get credit.
- It’s not like you needed LLMs for quickjs which already had known and unpatched problems. It’s a toy project.
It would be cool to see exploits for something like curl.
- two points -
1) it becomes increasingly more dangerous to dl stuff from the internet and just run it, even its opensource, given normally people don't read all of it. for weird repos I'd recomment to do automated analysis with opus 4.5 or the gpt 5.2 indeed.
2) if we assume adversaries are using LLMs to churn exploits 24/7, which we should absolutely do, perhaps the time where we turn the internet off whenever is not needed, is not far.
- Your personal data will become more important as time goes by... And you will need to have less trust in having multiple accounts with sensitive data stored [online shopping etc] as they just become vectors to attack.
- I am working on a little project in my offhours, and asked a non-hacker (but competent programmer) friend to take a run at exploiting it. Great success: my project was successfully exploited.
The industrialization of exploit generation is here IMO.
- reverse engineering code is still pretty average, I'm fare limited in attention and time but LLM are not pulling their weight in this area today, be it compounding errors or in context failures.
by JohnLeitch
0 subcomment
- This is interesting, but in most cases the challenge is finding a truly exploitable bug. If LLMs can get to the point where they can analyze a codebase and identify vulnerabilities, we're going to see some shit. But as of right now, this looks like a medium-to-low complexity bug that any competent exploit developer could work with easily.
- I wonder if later challenges would be cheaper if summary of lesser challenges and solutions were also provided? Building up difficulty.
- The NSO Group going to spawn 10k Claude Code instances now.
by pianopatrick
2 subcomments
- I would not be shocked to learn that intelligence agencies are using AI tools to hack back into AI companies that make those tools to figure out how to create their own copycat AI.
by DeathArrow
2 subcomments
- >Recently I ran an experiment where I built agents on top of Opus 4.5 and GPT-5.2 and then challenged them to write exploits for a zeroday vulnerability in the QuickJS Javascript interpreter.
I think the main challenge for hackers is to find 0day vulnerabilities, not writing the actual exploit code.
by idiotsecant
0 subcomment
- It's tempting to say that malware protection needs to be LLM based as well, but it's unlikely that on-machine malware defense can ever match the resources that would be trivially available to attackers.
by erichocean
0 subcomment
- The reverse is also true: secure code is difficult to write, and LLMs at scale will make it much easier to develop secure code.
- My take away: apparently Cyberpunk Hackers of the dystopian future cruising through the virtual world will use GPT-5.2-or-greater as their "attack program" to break the "ICE" (Intrusion Countermeasures Electronics, not the currently politically charged term...).
I still doubt they will hook up their brains though.