FRESH

Hacker News

Show HN: Trust Protocols for Anthropic/OpenAI/Gemini

39 points by alexgarden

by giancarlostoro

1 subcomments

by shawnreilly

1 subcomments

I would recommend to keep working on this. I'm interested in this space, and also contributing. Are you looking for collaborators? I think if you continue to iterate on this, there will be value, because these problems do need to be solved.
I would also recommend to create Standards for the new Protocols you are developing. Protocols need standards, so that others can do their own implementations of the protocol. If you have a Standard, someone else could be building in a completely different language (like rust or go), and not use any SDK you provide, but still be interoperable with your AAP and AIP implementation for smoltbot. (because both support the Standards of the AAP and AIP Protocols).
I also want to note, you cannot trust that the LLM Model will do what your instructions say. The moment they fall victim to a prompt injection or confused deputy attack, all bets are off the table. These are the same as soft instruction sets, which are more like advice or guidance, not a control or gate. To be able to provide true controls and gates, they must be external, authoratative, and enforced below the decision layer.

0 subcomment

by neom

1 subcomments

Seems like your timing is pretty good - I realize this isn't exactly what you're doing, but still think it's probably interesting given your work: https://www.nist.gov/news-events/news/2026/02/announcing-ai-...
Cool stuff Alex - looking forward to seeing where you go with it!!! :)

by Normal_gaussian

1 subcomments

Have you tried using a more traditional, non-LLM, loop to do the analysis? I'd assume it wouldn't catch more of the more complex deceptive behaviours, but I'm assuming most detections can be done with various sentiment analysis / embedding tools which would drastically reduce cost and latency. If you have tried, do you have any benchmarks?
Anecdotally, I often end up babysitting agents running against codebases with non-standard choices (e.g. yarn over npm, podman over docker) and generally feel that I need a better framework to manage these. This looks promising as a less complex solution - can you see any path to making it work with coding agents/subscription agents?
I've saved this to look at in more detail later on a current project - when exposing an embedded agent to internal teams I'm very wary of handling the client conversations around alignment, so I find the presentation of the cards and the violations very interesting - I think they'll understand the risks a lot better, and it may also give them a method of 'tuning'.

by geiser

1 subcomments

Definitely interesting, I hope all of this standardizes some day in the future, and if it's your protocol, great.
I have been following AlignTrue https://aligntrue.ai/docs/about but I think I like more your way of doing accountability and acting on thinking process instead of being passive. Apart from the fact that your way is a down-to-earth, more practical approach.
Great showcase live demo, however I would have liked a more in-depth showcasing of AAP and AIP even in this situation of multi-agent interactions, to understand the full picture better. Or simply perhaps prepare another showcase for the AAP and AIP. Just my two cents.
PS. I'm the creator of LynxPrompt, which honestly falls very short for this cases we're treating today, but with that I'm saying that I keep engaged on the topic trust/accountability, on how to organize agents and guide them properly without supervision.

by drivebyhooting

1 subcomments

> What these protocols do not do: Guarantee that agents behave as declared
That seems like a pretty critical flaw in this approach does it not?

by Stefan-H

1 subcomments

My opinion is that all attempts to make an LLM behave securely that are based on training and prompting are doomed to fail. In Security, we have the notion of the CIA triad (Confidentiality, availability, and integrity), when we discuss this we often explain that these properties can be protected through people, processes, and technology. Training and prompting an AI to behave appropriately is far more akin to a "people" focussed control (similar to training and awareness practices) rather than a "technology" control.
The only way we will actually secure agents is by only giving them the permissions they need for their tasks. A system that uses your contract proposal to create an AuthZ policy that is tied to a short-lived bearer token which the agent can use on its tool calls would ensure that the agent actually behaves how it ought to.

by alexgarden

0 subcomment

Update: Just shipped cryptographic verification for the entire integrity pipe.
Checkpoints produce signed certs: SHA-256 input commitments + Ed25519 sigs + tamper-evident hash chain and Merkle inclusion proof. Mess with it and the math breaks.
Massive update to the interactive showcase to demo all of this running against live services: https://www.mnemom.ai/showcase <-- all features interactive - no BS.
This is the answer to "who watches the watchmen". More to come.

by root_axis

1 subcomments

Presumably the models would at the very least need major fine tuning on this standard to prevent it from being mitigated through prompt injection.

by tiffanyh

1 subcomments

Super interesting work.
Q: how is your AAP different than the industry work happening on Intent/Instructions.

by CuriouslyC

1 subcomments