FRESH

Hacker News

Home

Don't trust AI agents

340 points by gronky_

by badsectoracula

18 subcomments

> OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines.
This reminds me of a very common thing posted here (and elsewhere, e.g. Twitter) to promote how good LLMs are and how they're going to take over programming: the number of lines of code they produce.
As if every competent programmer suddenly forgot the whole idea of LoC being a terrible metric to measure productivity or -even worse- software quality. Or the idea that software is meant to written to be readable (to water down "Programs are meant to be read by humans and only incidentally for computers to execute" a bit). Or even Bill Gates' infamous "Measuring programming progress by lines of code is like measuring aircraft building progress by weight".
Even if you believe that AI will -somehow- take over the whole task completely so that no human will need to read code anymore, there is still the issue that the AIs will need to be able to read that code and AIs are much worse at doing that (especially with their limited context sizes) than generating code, so it still remains a problem to use LoCs as such a measure even if all you care are about the driest "does X do the thing i want?" aspect, ignoring other quality concerns.

by aerhardt

3 subcomments

A question I've been asking myself and which I honestly want to put out there - and I apologize in advance, because you will see me repeat it in other threads, out of genuine curiosity:
Does your life have so much friction that you need a digital agent to act on your behalf?
Some of the use cases I saw on the OpenClaw website, like "checking me into a flight", are non-issues for me.
I work in business automation, but paradoxically I don't think too much about annoyances in my private life. Everything feels rather frictionless.
In business, I see opportunities to solve friction and that's how I make money, but even then, often there are barriers that are very hard to surmount:
(a) problems are complex to solve and require complex solutions such as deterministic or ML systems that LLMs are not even close to being able to create ad-hoc
(b) entrenched processes and incumbent organizations create moats that are hard to cross (ex: LinkedIn makes automation very hard)
(c) some degree of friction, in some cases, may actually be useful!
I imagine there are similar dynamics in the consumer space, but more than anything, I may not be seeing issues with such a critical eye (I like to relax after work, after all)
So, do you have problems in your private life that you'd want to take on the risks - and friction - of maintaining these agents?

by buremba

4 subcomments

My take is that agents should only take actions that you can recover from by default. You can gradually give it more permission and build guardrails such as extra LLM auditing, time boxed whitelisted domains etc. That's what I'm experimenting with https://github.com/lobu-ai/lobu
1. Don't let it send emails from your personal account, only let it draft email and share the link with you.
2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.
3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.
4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.
Last but last least, use proper VMs like Kata Containers and Firecrackers. Not just Docker containers in production.

by VladVladikoff

5 subcomments

This doesn’t really feel like enough guardrails to prevent the type of problems we’ve seen so far. For example an agent in a single container which has access to an email inbox, can still do a lot of damage if that agent goes off the rails. We agree this agent should not be trusted, yet the ideas proposed as a solution are insufficient. We need a fundamentally different approach.
Also and this is just my ignorance about Claws, but if we allow an agent permission to rewrite its code to implement skills, what stops it from removing whatever guardrails exist in that codebase?

by lucrbvi

5 subcomments

Why does OpenClaw have 800,000+ lines of code?? Isn't it just a connector for LLM APIs and other tools?

by tabs_or_spaces

0 subcomment

> OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines. It was written in weeks with no proper review process.
Yeah, but the world rewarded this by making it the fastest growing github project. The author gets on the podcasts, gets the high profile jobs from big tech. I'm more encouraged to do things this way than being security minded about all this.
And there's no accountability to this at all. If an agent leaks private data, the user is to blame and not the author. If Google bans your services for using api keys incorrectly, we cast the bad eye towards Google and not the maintainer than enabled and approved it.
There's just so much incentive for for "not reading code" and not developing secure code that is just going to get worse over time. This is the hype and the type of engineering that we all allow either by agreeing or by staying silent.
I agree with the author, but the world works off a different set of principles than what we're used to. I just see the world blindly trusting agents more.

by justonceokay

0 subcomment

I have twice encountered a phone tree AI agent saying my problem could not be solved and then ending the call. One was for PayPal fraud and the other was for closing an unused bank account.
For right now my trick is to say I have a problem that is more recognizable and mundane to the ai (i .e. lie) and then when I finally get the human just say “oh that was a bunch of hooey here’s what I’m trying to do”. For PayPal that involved asking for help with a business tax that did not exist. For my bank it involved asking to /open/ a new account. Obviously th AI wants to help me open an account, even if my intention is to close one.
That will only work for so long but it’s something

by mathgladiator

1 subcomments

I was blown away by OpenClaw until I saw the bill. Ultimately, I think of these ecosystems as personal enhancements and AI costs need to come down dramatically for real problem. Worse, however, is the security theater. I would not want to be the operator for any business built with front-line LLM usage based on a yolo'd agent framework. I'm very happy to use these for silo'd components that are well isolated and have reasonable QA processes (and that can even included agents since now we literally have no excuse to not have amazing test coverage).
Their niche is going to be back office support, but even that creates risk boundaries that can be insurmountable. A friend of mine had a agent do sudo rm -rf ... wtf.
My view is that I want to launch an agent based service, but I'm building a statically typed ecosystem to do so with bounds and extreme limits.

by Sytten

4 subcomments

I am a caveman, I don't understand the need for a personal assistant. What are you guys using it for?

by echoangle

1 subcomments

Looking at the NanoClaw GitHub README:
> If you want to add Telegram support, don't create a PR that adds Telegram alongside WhatsApp. Instead, contribute a skill file (.claude/skills/add-telegram/SKILL.md) that teaches Claude Code how to transform a NanoClaw installation to use Telegram.
Why would you want that? You want every user asks the AI to implement the same feature?

by smallpipe

1 subcomments

Docker is not a security boundary. You’re one prompt injection away from handing over your gmail cookie.

by sarkarsh

0 subcomment

xixyhuang nailed it — container escape is a red herring. The real problem is that your agent holds your OAuth tokens and can do massive damage without ever leaving its sandbox.
What bugs me about the current discourse is everyone focuses on where agents run and what they can access, but almost nobody talks about reconstructing what they actually did after the fact. Aviation has black boxes. Finance has audit trails. Agent systems have... logs the agent writes about itself. That's like asking the pilot to self-report the flight recorder.
Until action logging happens outside the agent's own process, none of the sandboxing stuff matters much.

by shich

1 subcomments

the trust problem cuts both ways tho — users don't trust agents, but the bigger issue is agents trusting each other. once you have multi-agent pipelines, you're one rogue upstream output away from a cascade. sandboxing individual agents is table stakes; what's actually hard is defining trust boundaries between them

by blakec

0 subcomment

The proxy-based secret injection approach mentioned upthread is solid for network credentials, but it doesn't cover the local attack surface — your SSH keys, GPG keys, AWS credentials sitting in dotfiles. Those are the actual high-value targets for a compromised agent on a dev workstation.
I run Claude Code with 84 hooks, and the one I trust most is a macOS Seatbelt (sandbox-exec) wrapper on every Bash tool call. It's about 100 lines of Seatbelt profile that denies read/write to ~/.ssh, ~/.gnupg, ~/.aws, any .env file, and a credentials file I keep. The hook fires on PreToolUse:Bash, so every shell command the agent runs goes through sandbox-exec automatically.
The key design choice: Seatbelt operates at the kernel level. The agent can't bypass it by spawning subprocesses, piping through curl, or any other shell trick — the deny rules apply to the entire process tree. Containers give you this too, but the overhead is absurd for a CLI tool you invoke 50 times a day. Seatbelt adds ~2ms of latency.
I built it with a dry_run mode (logs violations but doesn't block) and ran it for a week before enforcing. 31 tests verify the sandbox catches attempts to read blocked paths, write to them, and that legitimate operations (git, python, file editing in the project directory) pass through cleanly.
The paths to block are in a config file, so it's auditable — you can diff it in code review. And it's composable with other layers: I also run a session drift detector that flags when the agent wanders off-task (cosine similarity against the original prompt embedding, checked every 25 tool calls).
None of this solves prompt injection fundamentally, but "the agent physically cannot read my SSH keys regardless of what it's been tricked into doing" is a meaningful property.

by rdtsc

1 subcomments

> The container boundary is the hard security layer — the agent can’t escape it regardless of configuration
I thought containers were never a proper hard security barrier? It’s barrier so better than not having it, if course.

by andai

0 subcomment

I move the security boundary one or two layers up: the Unix user (on main machine I run them as a `agent` user, so they can't read or write my files), or even better, just give it a separate machine. (VPSes are now popular for this purpose, as are Mac Minis. My choice is $50 Thinkpad :)
That said I am a fan of Nanoclaw, and especially the philosophy of "it should be small enough to understand, modify and extend itself." I think that's a very good idea, for many reasons.
The idea of giving different agents access to different subsets of information is interesting. That's the Principle of Least Privilege. That seems like a decent idea. Each individual agent can get prompt injected, but the blast radius is limited to what that specific agent has access to.
Still, I find it amusing that people are running this with strict rulesets, in Docker, on a VM, and then they hook it up to their GMail account (and often with random discount LLMs to boot!). It's like, we need to be clear about what the actual threat model is there. It comes down to trust and privacy.
You can start by thinking, "if the LLM were perfectly reliable (not susceptible to random error or prompt injection) and perfectly private (running on my own hardware)", what would you be comfortable letting it do. And then you remove these hypothetical perfect qualities one by one to arrive at what we have now: slightly dodgy, moderately prompt-injectable cloud services. Each one changing the picture in a slightly different way.
I don't really see a solution to the Security/Privacy <-> Convenience tension, except "wait for them to get smarter" (mostly done) and "accept loss of privacy" (also mostly done, sadly!)

by Eggpants

0 subcomment

I’m using this but using gpt-oss-120B instead of a cloud service. It has been eye opening when I realized the LLM is beings used as a compiler. I asked it to add apple iMessage and apple notes support as I I rather have long responses, like write me a program ideas, not fill my iMessage history. The local LLM, which I believe has limited bash training data, does pretty well.
For example: I enjoy industrial music and asked it for the tour data of the band KMFDM which returned they will be in Las Vegas in April for a festival(Sick new world). This festival has something like 20 bands most of which I never heard of. I asked nanoclaw to search all of the band list and generate a listing grouped by the type of music they play: Industrial, rap, etc. It did a good job based on bands I do know.
I was pleased as I certainly did not want to do 20 band web searches by hand. It’s still at a bar trick level. It gives me hope that an upgraded agent based Siri-like OS component could actually be useful from time to time.

by xrd

0 subcomment

How can I trust this discussion when my browser won't trust their certs?

by Yokohiii

0 subcomment

Why do people take this article serious? It's just a wall of gibberish trying to make the product look more "secure" then others. It's not. It adds shallow secure looking random junk without tackling the core issues. Which are not solvable obviously.

0 subcomment

by himata4113

4 subcomments

My assistant has no permissions at all and is just as useful. All it needs is todo, reminders and websearch (and maybe a browser but ymmv).

by nkzd

1 subcomments

As someone who only coding agents at work, can someone describe their use case for claw type agent? What do you do with it?

by spacecadet

0 subcomment

Why this is posted here and is a revelation for anyone, this many years later is indicative of the times. Good bye.

by nemo44x

0 subcomment

I’ve seen skills, etc haphazardly being launched with no constraints or guardrails. That more or less have admin access and can take actions that are not reversible.
It’s the monkey with a gun meme.

by dave_meshimize

0 subcomment

Treating the LLM as an untrusted execution thread at the OS level is probably the only sustainable way to handle agentic autonomy... Most frameworks try to manage permissions with application level logic which is basically just a game of whack a mole with prompt injection.

by nickdirienzo

1 subcomments

I tried NanoClaw and love the skill (and container by default) model. But having skills generate new code in my personalized fork feels off to me… I think it’s because eventually the “few thousand auditable lines” idea vanishes with enough skills added?
Could skill contributions collapse into only markdown and MCP calls? New features would still be just skills; they’d bring in versioned, open-source MCP servers running inside the same container sandbox. I haven’t tried this (yet) but I think this could keep the flexibility while minimizing skill code stepping on each other.

by adithyassekhar

1 subcomments

Really good points about ai making gigantic heaps of code no human can ever review.
It's almost like bureaucracy. The systems we have in governments or large corporations to do anything might seem bloated an could be simplified. But it's there to keep a lot of people employed, pacified, powers distributed in a way to prevent hostile takeovers (crazy). I think there was a cgp grey video about rulers which made the same point.
Similarly AI written highly verbose code will require another AI to review or continue to maintain it, I wonder if that's something the frontier models optimize for to keep them from going out of business.
Oh and I don't mind they're bashing openclaw and selling why nanoclaw is better. I miss the times when products competed with each other in the open.

by gmerc

0 subcomment

Oh this can be monetized: claw-guard.org/adnet.
Another persons trust issues are your business model.

by simon_void

0 subcomment

nobody trusts AI agents, that's why they are put in a harness. It's just that I additionally belong to the people who don't trust AI agents to always adhere to harnesses either.

by vitto_gioda

0 subcomment

"Time to understand 8 minutes" what a non-technical purpose...

by ed_mercer

0 subcomment

How is Nanoclaw different from running openclaw in a VM?

by raffael_de

0 subcomment

> OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies.
Isn't OpenClaw just ...
```
  while(true) {
    in = read_input();
    if(in) {
      async relay_2_llm(in);
    }
    sleep(1.0);
  }
```
... and then some?

by theturtletalks

3 subcomments

Has anyone used:
OpenClaw
NanoClaw
IronClaw
PicoClaw
ZeroClaw
NullClaw
Any insights on how they differ and which one is leading the race?

by jswelker

1 subcomments

As a fun thought experiment, when people complain about LLMs, I substitute the word "human" or "employee" into the sentence and see if it is equally true.
"You can never really trust an LLM!" -> "You can never really trust an employee!" (Every IT department ever.)
"LLMs make shit up." -> "Humans make shit up." (Wow very profound insight.)

by Kiboneu

0 subcomment

“If you trust the tool then you’re holding it wrong”

by desireco42

0 subcomment

I think you have issue with your security cert.

by bigstrat2003

1 subcomments

All this talk about sandboxing and permissions misses the obvious: since you can't trust the agents, don't freaking use them. It is utterly stupid to give an LLM access to run things on your computer, because nothing you do can stop it from hallucinating garbage that harms your system. The whole "agent" craze is the most incredible display of irresponsibility I have ever seen in this industry.

by formerly_proven

0 subcomment

d'uh

by adrian-vega

0 subcomment

[dead]

by SignalStackDev

0 subcomment

[dead]

by pipejosh

0 subcomment

[dead]

by Alex_001

0 subcomment

[dead]

by snowhale

0 subcomment

[dead]

by techpulse_x

0 subcomment

[dead]

0 subcomment

by TeeWEE

7 subcomments

Do you trust your employees? Do you trust a contracter? Do you trust other people?
AI is similar to a person you dont know that does work for you. Probably AI is a bit more trustworthy than a random person.
But a company, needs to let employees take ownership of their work, and trust them. Allow them to make mistakes.
Isnt AI no different?