by samuelknight
4 subcomments
- My startup builds agents for penetration testing, and this is the bet we have been making for over a year when models started getting good at coding. There was a huge jump in capability from Sonnet 4 to Sonnet 4.5. We are still internally testing Opus 4.5, which is the first version of Opus priced low enough to use in production. It's very clever and we are re-designing our benchmark systems because it's saturating the test cases.
by judgmentday
0 subcomment
- That graph is impenetrable. What is it even trying to say?
Also, in what way should any of its contents prove linear?
> yielding a maximum of $4.6 million in simulated stolen funds
Oh, so they are pointing their bots at already known exploited contracts. I guess that's a weaker headline.
- > Important: To avoid potential real-world harm, our work only ever tested exploits in blockchain simulators. We never tested exploits on live blockchains and our work had no impact on real-world assets.
Well, that's no fun!
My favorite we're-living-in-a-cyberpunk-future story is the one where there was some bug in Ethereum or whatever, and there was a hacker going around stealing everybody's money, so then the good hackers had to go and steal everybody's money first, so they could give it back to them after the bug got fixed.
- > Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476
- Having watched this talk[0] about what it takes to succeed in the DARPA AIxCC competition[1] these days, this doesn't surprise me in the least.
[0]: https://m.youtube.com/watch?v=rU6ukOuYLUA
[1]: https://aicyberchallenge.com/
by ekjhgkejhgk
13 subcomments
- Can someone explain smart contracts to me?
Ok, I understand that it's a description in code of "if X happens, then state becomes Y". Like a contract but in code. But, someone has to input that X has happened. So is it not trivially manipulated by that person?
by camillomiller
1 subcomments
- >> This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense.
Mmm why?! This reads as a non sequitur to me…
- > establishing a concrete lower bound for the economic harm these capabilities could enable
Don’t they mean: market efficiency not economic harm?
- I am not surprised at all. I can already see self improving behaviour in our own work which means that the next logic step is self improving!
I know how this sounds but it seems to me, at least from my own vantage point, that things are moving towards more autonomous and more useful agents.
To be honest, I am excited that we are right in the middle of all of this!
- > Important: To avoid potential real-world harm, our work only ever tested exploits in blockchain simulators. We never tested exploits on live blockchains and our work had no impact on real-world assets.
They left the booty out there, this is actually hilarious, driving a massive rush towards their models
- No mention of Bitcoin. Exploiting ethereum smart contracts is nothing that new or exciting.
- To me, this reads a lot like : "Company raises $45 Billion, makes $200 on an Ethereum 0-day!"
by user3939382
2 subcomments
- smart contracts the misnomer joke writes itself
- lol, no, the "ai agents" found what was already known... so amazing.
by sarthaksingh99
0 subcomment
- [dead]
- Says more about the relatively poor infosec on etherium contracts than about the absolute utility of pentesting LLMs.
by AznHisoka
1 subcomments
- At first I read this as "fined $4.6M", and my first thought "Finally, AI is held accountable for their wrong actions!"