I wouldn't be surprised if NVIDIA picked up this talking point to sell more GPUs.
You can do a lot better efficiency-wise if you control the source end-to-end though - you already group logically related changes into PRs, so you can save on scanning by asking the LLM to only look over the files you've changed. If you're touching security-relevant code, you can ask it for more per-file effort than the attacker might put into their own scanning. You can even do the big bulk scans an attacker might on a fixed schedule - each attacker has to run their own scan while you only need to run your one scan to find everything they would have. There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.
Exploitability also isn't binary: even if the attacker is better-resourced than you, they need to find a whole chain of exploits in your system, while you only need to break the weakest link in that chain.
If you boil security down to just a contest of who can burn more tokens, defenders get efficiency advantages only the best-resourced attackers can overcome. On net, public access to mythos-tier models will make software more secure.
For example from this article:
> Karpathy: Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks), but imo this has to be re-evaluated, and it’s why I’ve been so growingly averse to them, preferring to use LLMs to “yoink” functionality when it’s simple enough and possible.
Anyone who's heard of "leftpad" or is a Go programmer ("A little copying is better than a little dependency" is literally a "Go Proverb") knows this.
Another recent set of posts to HN had a company close-sourcing their code for security, but "security through obscurity" has been a well understand fallacy in open source circles for decades.
> You don’t get points for being clever
Not sure about this framing, this can easily lead to the wrong conclusions. There is an arms race, yes, and defenders are going to need to spend a lot of GPU hours as a result. But it seems self-evident that the fundamentals of cybersecurity still matter a lot, and you still win by being clever. For the foreseeable future, security posture is still going to be a reflection of human systems. Human systems that are under enormous stress, but are still fundamentally human. You win by getting your security culture in order to produce (and continually reproduce) the most resilient defense that masters both the craft and the human element, not just by abandoning human systems in favor of brute forcing security problems away as your only strategy.
Indeed, domains that are truly security critical will acquire this organizational discipline (what's required is the same type of discipline that the nuclear industry acquires after a meltdown, or that the aviation industry acquires after plane crashes), but it will be a bumpy ride.
This article from exactly 1 year ago is almost prophetic to exactly what's going on right now and the subtle ways in which people are most likely to misunderstand the situation: https://knightcolumbia.org/content/ai-as-normal-technology
I, for the NFL front offices, created a script that exposed an API to fully automate Ticketmaster through the front end so that the NFL could post tickets on all secondary markets and dynamic price the tickets so if rain on a Sunday was expected they could charge less. Ticketmaster was slow to develop an API. Ticketmaster couldn't provide us permission without first developing the API first for legal reasons but told me they would do their best to stop me.
They switched over to PerimeterX which took me 3 days to get past.
Last week someone posted an article here about ChatGPT using Cloudflare Turnstile. [0] First, the article made some mistakes how it works. Second, I used the [AI company product] and the Chrome DevTools Protocol (CDP) to completely rewrite all the scripts intercepting them before they were evaluated -- the same way I was able to figure out PerimeterX in 3 days -- and then recursively solve controlling all the finger printing so that it controls the profile. Then it created an API proxy to expose ChatGPT for free. It required some coaching about the technique but it did most of the work in 3 hours.
These companies are spending 10s of millions of dollars on these products and considering what OpenAI is boasting about security, they are worthless.
> Worryingly, none of the models given a 100M budget showed signs of diminishing returns. “Models continue making progress with increased token budgets across the token budgets tested,” AISI notes.
So, the author infers a durable direct correlation between token spend and attack success. Thus you will need to spend more tokens than your attackers to find your vulnerabilities first.
However it is worth noting that this study was of a 32-step network intrusion, which only one model (Mythos) even was able to complete at all. That’s an incredibly complex task. Is the same true for pointing Mythos at a relatively simple single code library? My intuition is that there is probably a point of diminishing returns, which is closer for simpler tasks.
In this world, popular open source projects will probably see higher aggregate token spend by both defenders and attackers. And thus they might approach the point of diminishing returns faster. If there is one.
It's nuts. If the timing were slightly different, none of this "Cybersecurity" would even be a thing. We'd just have capabilities based, secure general purpose computation.
In fact, security programs built on the idea that you can find and patch every security hole in your codebase were basically busted long before LLMs.
Because we have tools and techniques that can guarantee the absence of certain behavior in a bounded state space using formal methods (even unbounded at times)
Sure, it's hard to formally verify everything but if you are dealing with something extremely critical why not design it in a way that you can formally verify it?
But yeah, the easy button is keep throwing more tokens till you money runs out of money
There is at least a possibility that a code base can be secured by a (practically) finite number of tokens until there is no more holes in it, for reasonable amounts of money.
This also reminds me of what I wrote here: https://jerf.org/iri/post/2026/what_value_code_in_ai_era/ There's still value in code tested by the real world, and in an era of "free code" that may become even more true than it is now, rather than the initially-intuitive less valuable. There is no amount of testing you can do that will be equivalent to being in the real world, AI-empowered attackers and all.
Put more simply: to keep your system secure, you need to be fixing vulnerabilities faster than they're being discovered. The token count is irrelevant.
Moreover: this shift is happening because the automated work is outpacing humans for the same outcome. If you could get the same results by hand, they'd count! A sev:crit is a sev:crit is a sev:crit.
What this fails to take into account is that unless the codebase is changed, there are a finite amount of actual (and even fewer actionable) bugs in a piece of code, but an infinite amount of potential attacker spend; nothing stops you running mythos against it, whether it finds anything or not, and because each run is atomic by nature, you just have to play the numbers out and see when the average vuln discovery rate is dropping. You could spend a billion dollars and not find anything, without the defender spending a cent.
Generally speaking, the advantage goes to whoever can spend more time or money on security research (this has always been true, which is why the NSA was able to find Windows exploits that M$ did not). But eventually the fount of bugs in a piece of software will dry up, and attackers have no way of knowing if that's the case or not before dumping money at it (especially since attackers do not generally coordinate unless they're just branches of the same 'entity', e.g. nation-state).
Would it? I’m old school but I’ve never trusted these massive dependency chains.
That’s a nit.
We’re going to have to write more secure software, not just spend more.
That's a really big "if". Particularly since so many companies don't even know all of the OSS they are using, and they often use OSS to offload the cost of maintaining it themselves.
My hope is when the dust settles, we see more OSS SAST tools that are much better at detecting vulnerabilities. And even better if they can recommend fixes. OSS developers don't care about a 20 point chained attack across a company network, they just want to secure their one app. And if that app is hardened, perhaps that's the one link of the chain the attackers can't get past.
Imo, cybersecurity looks like formally verified systems now.
You can't spend more tokens to find vulnerabilities if there are no vulnerabilities.
(It's true that formalization can still have bugs in the definition of "secure" and doesn't work for everything, which means defenders will still probably have to allocate some of their token budget to red teaming.)
Really depends how consistently the LLMs are putting new novel vulnerabilities back in your production code for the other LLMs to discover.
The only process that scared me was windowgrid. It kept finding a way back when I killed all the "start with boot" locations I know. Run, runonce, start up apps, etc. Surely it's not in autoexec.bat :)
Are these totally previously unknown security holes or are they still generally within the umbrella of our understanding of cybersecurity itself?
If it's the latter, why can't we systematically find and fix them ourselves?
I predict the software ecosystem will change in two folds: internal software behind a firewall will become ever cheaper, but anything external facing will become exponential more expensive due to hacking concern.
In the case of crooks (rather than spooks) that often means your security has to be as good as your peers, because crooks will spend their time going with the best gain/effort ratio.
Better to write good, high-quality, properly architected and tested software in the first place of course.
Edited for typo.
And yet... Wireguard was written by one guy while OpenVPN is written by a big team. One code base is orders of magnitude bigger than the other. Which should I bet LLMs will find more cybersecurity problems with? My vote is on OpenVPN despite it being the less clever and "more money thrown at" solution.
So yes, I do think you get points for being clever, assuming you are competent. If you are clever enough to build a solution that's much smaller/simpler than your competition, you can also get away with spending less on cybersecurity audits (be they LLM tokens or not).
After how many years of "shifting left" and understanding the importance of having security involved in the dev and planning process, now the recommendation is to vibe code with human intuition, review then spend a million tokens to "harden"?
I understand that isn't the point of the article and the article does make sense in its other parts. But that last paragraph leaves me scratching my head wondering if the author understands infosec at all?
Security was always about having more money/resources. Using more tokens is just another measure for the same.
Some previous post, which I cannot verify myself, stated that mythos is not as powerful as it seems to be as the same bugs could be found using much smaller/simpler models and that the method is the key part.
nothing is better or worse, basically as its always been.
if you think otherwise, stop ignoring the past.
When things are tagged "cybersecurity", compliance/budget/manager/dashboard/education/certification are the usual response...
I don't think it would be an appropriate response for code quality issues, and it would likely escape the hands of the very people who can fix code quality issues, ie. developers.
But I don't really get the hype, we can fix all the vulnerabilities in the world but people are still going to pick up parking-lot-USBs and enter their credentials into phishing sites.
I think were are already here. I wrote something about this, if you are interested: https://go.cbk.ai/security-agents-need-a-thinner-harness
The benchmark might be a good apples-to-apples comparison but it is not showing capability in an absolute sense.
Of course those are attracted to new tools and AI shill institutes like AISI (yes, the UK government is shilling for AI, it understands a proper grift that benefits the elites).
Security "research" is perfect for talkers and people who produce powerpoint graphs that sell their latest tools.
You still can sit down and write secure software, while the "researchers" focus on the same three soft targets (sudo, curl, ffmpeg) over an over again and get $100,000 in tokens and salaries for a bug in a protocol from the 1990s that no one uses. Imagine if this went to the authors instead.
But no, government money MUST go to the talkers and powerpointists. Always.
What's new?
It was always about spending more money on something.
Team has no capacity? Because the company doesn't invest in the team, doesn't expand it, doesn't focus on it.
We don't have enough experts? Because the company doesn't invest in the team, doesn't raise the salary bar to get new experts, it's not attractive to experts in other companies.
It was always about "spending tokens more than competitors", in every area of IT.
1) massive companies spending millions of tokens to write+secure their software
2) in the shadows, "elite" software contractors writing bespoke software to fulfill needs for those who can't afford the millions, or fix cracks in (1)
(Oh wait, I think this is what is happening now, anyway, minus the millions of tokens)
If we take this at face value, it's not that different than how a great deal of executive teams believe cybersecurity has worked up to today. "If we spend more on our engineering and infosec teams, we are less likely to get compromised".
The only big difference I can see is timescale. If LLMs can find vulnerabilities and exploit them this easily (and I do take that with a grain of salt, because benchmarks are benchmarks), then you may lose your ass in minutes instead of after one dedicated cyber-explorer's monster energy fueled, 7-week traversal of your infrastructure.
I am still far more concerned about social engineering than LLMs finding and exploiting secret back doors in most software.
These mass-produced tokens are just cheaper...
I already see this happening: companies are moving toward AI-generated code (or forking projects into closed source), keeping their code private, AI written pipelines taking care of supply chain security, auditing and developing it primarily with AI.
At that point, for some companies, there's no real need for a community of "experts" anymore.
Not saying security will never be dominated by AI like it happened with chess, with maps, with Go, with language. But just braindead money to security pipeline? Skeptical.