- A) Process untrustworthy input - B) Have access to private data - C) Be able to change external state or communicate externally.
It's not bullet-proof, but it has helped communicate to my management that these tools have inherent risk when they hit all three categories above (and any combo of them, imho).
[EDIT] added "or communicate externally" to option C.
[1] https://simonwillison.net/2025/Nov/2/new-prompt-injection-pa... [2] https://ai.meta.com/blog/practical-ai-agent-security/
He links to this page on the Google vulnerability reporting program:
https://bughunters.google.com/learn/invalid-reports/google-p...
That page says that exfiltration attacks against the browser agent are "known issues" that are not eligible for reward (they are already working on fixes):
> Antigravity agent has access to files. While it is cautious in accessing sensitive files, there’s no enforcement. In addition, the agent is able to create and render markdown content. Thus, the agent can be influenced to leak data from files on the user's computer in maliciously constructed URLs rendered in Markdown or by other means.
And for code execution:
> Working with untrusted data can affect how the agent behaves. When source code, or any other processed content, contains untrusted input, Antigravity's agent can be influenced to execute commands. [...]
> Antigravity agent has permission to execute commands. While it is cautious when executing commands, it can be influenced to run malicious commands.
I ma hearing again and again by collegues that our jobs are gone, and some are definitely going to go, thankfully I'm in a position to not be too concerned with that aspect but seeing all of this agentic AI and automated deployment and trust that seems to be building in these generative models from a birds eye view is terrifying.
Let alone the potential attack vector of GPU firmware itself given the exponential usage they're seeing. If I was a state well funded actor, I would be going there. Nobody seems to consider it though and so I have to sit back down at parties and be quiet.
There are tools for that, sandboxing, chroots, etc... but that requires engineering and it slows GTM, so it's a no-go.
No, local models won't help you here, unless you block them from the internet or setup a firewall for outbound traffic. EDIT: they did, but left a site that enables arbitrary redirects in the default config.
Fundamentally, with LLMs you can't separate instructions from data, which is the root cause for 99% of vulnerabilities.
Security is hard man, excellent article, thoroughly enjoyed.
For other (publicly) known issues in Antigravity, including remote command execution, see my blog post from today:
https://embracethered.com/blog/posts/2025/security-keeps-goo...
Also rereading the article, I cannot put down the irony that it seems to use a very similar style sheet to Google Cloud Platform's documentation.
I'm hoping they've changed their mind on that but I've not checked to see if they've fixed it yet.
They pinky promised they won’t use something, and the only reason we learned about it is because they leaked the stuff they shouldn’t even be able to see?
“it’s going to obey rules that are are enforced as conventions but not restrictions”
Which is what you’re doing if you expect it to respect guidelines in a config.
You need to treat it, in some respects, as someone you’re letting have an account on your computer so they can work off of it as well.
I know it is only one more step, but from a privilege perspective, having the user essentially tell the agent to do what the attackers are saying, is less realistic then let’s say a real drive-by attack, where the user has asked for something completely different.
Still, good finding/article of course.
Agents often have some DOM-to-markdown tool they use to read web pages. If you use the same tool (via a "reader mode") to view the web page, you'd be assured the thing you're telling the agent to read is the same thing you're reading. Cursor / Antigravity / etc. could have an integrated web browser to support this.
That would make what the human sees closer to what the agent sees. We could also go the other way by having the agent's web browsing tool return web page screenshots instead of DOM / HTML / Markdown.
Some of them have default settings that would prevent it (though good luck figuring that out for each agent in turn - I find those security features are woefully under-documented).
And even for the ones that ARE secure by default... anyone who uses these things on a regular basis has likely found out how much more productive they are when you relax those settings and let them be more autonomous (at an enormous increase in personal risk)!
Since it's so easy to have credentials stolen, I think the best approach is to assume credentials can be stolen and design them accordingly:
- Never let a coding agent loose on a machine with credentials that can affect production environments: development/staging credentials only.
- Set budget limits on the credentials that you expose to the agents, that way if someone steals them they can't do more than $X worth of damage.
As an example: I do a lot of work with https://fly.io/ and I sometimes want Claude Code to help me figure out how best to deploy things via the Fly API. So I created a dedicated Fly "organization", separate from my production environment, set a spending limit on that organization and created an API key that could only interact with that organization and not my others.
I mean regardless of how you feel about AI, we can all agree that security is still a concern, right? We can still move fast while not pushing out alpha software. If you're really hyped on AI then aren't you concerned that low hanging fruit risks bringing it all down? People won't even give it a chance if you just show them the shittest version of things
They are effectively admitting that you can't have an "agentic" IDE that is both useful and safe. They prioritized the feature set (reading files + internet access) over the sandbox. We are basically repeating the "ActiveX" mistakes of the 90s, but this time with LLMs driving the execution.
Feel free to reach out if you're trying to build safeguards into your ai system!
centure.ai
POST - https://api.centure.ai/v1/prompt-injection/text
Response:
{ "is_safe": false, "categories": [ { "code": "data_exfiltration", "confidence": "high" }, { "code": "external_actions", "confidence": "high" } ], "request_id": "api_u_t6cmwj4811e4f16c4fc505dd6eeb3882f5908114eca9d159f5649f", "api_key_id": "f7c2d506-d703-47ca-9118-7d7b0b9bde60", "request_units": 2, "service_tier": "standard" }
If you give an llm access to sensitive data, user input and the ability to make arbitrary http calls it should be blindingly obvious that it's insecure. I wouldn't even call this a vulnerability, this is just intentionally exposing things.
If I had to pinpoint the "real" vulnerability here, it would be this bit, but the way it's just added as a sidenote seems to be downplaying it: "Note: Gemini is not supposed to have access to .env files in this scenario (with the default setting ‘Allow Gitignore Access > Off’). However, we show that Gemini bypasses its own setting to get access and subsequently exfiltrate that data."
You're telling the agent "implement what it says on <this blog>" and the blog is malicious and exfiltrates data. So Gemini is simply following your instructions.
It is more or less the same as running "npm install <malicious package>" on your own.
Ultimately, AI or not, you are the one responsible for validating dependencies and putting appropriate safeguards in place.
Should you do that? Maybe not, but people will keep doing that anyway as we've seen in the era of StackOverflow.
> However, the default Allowlist provided with Antigravity includes ‘webhook.site’.
It seems like the default Allowlist should be extremely restricted, to only retrieving things from trusted sites that never include any user-generated content, and nothing that could be used to log requests where those logs could be retrieved by users.
And then every other domain needs to be whitelisted by the user when they come up before a request can be made, visually inspecting the contents of the URL. So in this case, a dev would encounter a permissions dialog asking to access 'webhook.site' and see it includes "AWS_SECRET_ACCESS_KEY=..." and go... what the heck? Deny.
Even better, specify things like where secrets are stored, and Antigravity could continuously monitor the LLM's to halt execution if a secret ever appears.
Again, none of this would be a perfect guarantee, but it seems like it would be a lot better?
likewise for the bad guys
All these years of cybersecurity build up and now there's these generic and vague wormholes right into it all.
Absolute amateurs.
Edit: "completely local" meant not doing any network calls unless specifically approved. When llm calls are completely local you just need to monitor a few explicit network calls to be sure. Unlike gemini then you don't have to rely on certain list of whitelisted domains.