FRESH

Hacker News

Home

Google Titans architecture, helping AI have long-term memory

581 points by Alifatisk

by okdood64

12 subcomments

From the blog:
https://arxiv.org/abs/2501.00663
https://arxiv.org/pdf/2504.13173
Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.

by doctor_blood

1 subcomments

"At long last, we have created the Torment Nexus from the classic novel Don't Create the Torment Nexus"
(In Eclipse Phase, TITAN - the Total Information Tactical Awareness Network - mulched humanity when it went rogue.)

by voodooEntity

3 subcomments

When i first read the papers for titans for me it was a "this will be a big step forward".
While i have no "AI" title or work in the respective AI industry, ive spend many years thinking about AI concepts, even long before the whole NN/LLM hype started.
Maybe because of that i was always really annoyed that LLM are called AI because in my years of thinking about how an actual "human like" thinking AI might work, the things an LLM does was far below what my minimum definition was.
But when i stumbled accross the Titans paper, while it still is not an "AI" as i would call it, from my POV its a massive step towarsd the right direction.
Sometimes i consider to write all my ideas/thoughts about AI down in my blog, but than i think nobody would care anyway since im not a known figure shrug - so if not to say "look i wrote it years ago!" theres no actual point in doing so i guess.
However - im looking forward to see titans in action, and i guess it will impress us all.

by kgeist

6 subcomments

>The model uses this internal error signal (the gradient) as a mathematical equivalent of saying, "This is unexpected and important!" This allows the Titans architecture to selectively update its long-term memory only with the most novel and context-breaking information
So one can break a model by consistently feeding it with random, highly improbable junk? Everything would be registered as a surprise and get stored, impacting future interactions

by cubefox

0 subcomment

It's interesting that they publish a blog post about the Titans and MIRAS papers only now, while the blog post about the new follow-up paper (Nested Learning), all by the same main author(!), came out a month ago: https://research.google/blog/introducing-nested-learning-a-n...

by nasvay_factory

1 subcomments

I wrote about that a while ago: https://paxamans.github.io/blog/titans/

by photochemsyn

0 subcomment

Long-term memory on top of the base model, but is this idea for local users or for the data-center hosted model used by many different people?
P.S. This quote from the paper sounds just like LLM output:
> "This memory module provides significantly higher expressive power, allowing the model to summarize large volumes of information without losing important context. The model isn't simply taking notes; it's understanding and synthesizing the entire story. Crucially, Titans doesn’t just passively store data. It actively learns how to recognize and retain important relationships and conceptual themes that connect tokens across the entire input."

by jonplackett

2 subcomments

I’m curious if this makes them more or less susceptible to prompt injection?
On the one hand can learning on the job allow better training of what not to be influenced by, but on the other hand can an injected prompt have an even deeper effect on them long term.

by atomicthumbs

0 subcomment

> Virtually all successful existing sequence models rely on mean squared error (MSE) or dot-product similarity for both their bias and retention. This reliance can make models sensitive to outliers and limit their expressive power.
[...]
> MEMORA: This model focuses on achieving the best possible memory stability by forcing its memory to act like a strict probability map. By using this constraint, it ensures that every time the memory state is updated, the changes are controlled and balanced. This guarantees a clean, stable process for integrating new information.Virtually all successful existing sequence models rely on mean squared error (MSE) or dot-product similarity for both their bias and retention. This reliance can make models sensitive to outliers and limit their expressive power.
so did a Titans write this

by bentt

0 subcomment

This just feels like a tremendous missing piece to LLMs. Looking forward to seeing it in action.

by Alifatisk

0 subcomment

Titans: Learning to Memorize at Test Time https://arxiv.org/abs/2501.00663

by riku_iki

1 subcomments

Post starts with wrong statement right away:
"The Transformer architecture revolutionized sequence modeling with its introduction of attention"
Attention was developed before transformers.

by dmix

1 subcomments

> The Transformer architecture revolutionized sequence modeling with its introduction of attention, a mechanism by which models look back at earlier inputs to prioritize relevant input data
I've always wanted to read how something like Cursor manages memory. It seems to have developed a long history of all of prompts and understands both the codebase and what I'm building slightly more over time, causing less errors.

by willangelo

0 subcomment

Very very interesting, definitely a missing piece in current AI space.
Small typo where the text “Virtually all successful existing sequence models rely on mean squared error…” is repeated twice within the same paragraph. Happens to the best of us.

by nubg

2 subcomments

Very interesting. Is it correct for me to imagine it as some kind of "LoRA" thats continuously adapted as the model goes through its day?
If so, could there perhaps be a step where the LoRA is merged back into the main model?
That would be like sleeping :-)

by 6r17

0 subcomment

Would this also allow to align it furthermore with user's prompt ? notably due to the surprise factor and how it may understand it ?

by bilsbie

1 subcomments

I submitted this exact url yesterday. What’s the criteria for when hn creates a new post vs going to the existing?

by themgt

1 subcomments

See also Hope:
In the previous sections, we first discussed Continuum Memory System (CMS) that allows for more persistent storage of memories and defines memory as a spectrum of blocks with different frequencies of update. Due to the larger capacity and constraints for scaling the parameters, often CMS requires simple learning rule but higher capacity to store more persistent knowledge. On the other hand, in the previous section, we discussed the design of a self-modifying Titans, where it can generate its own keys and so learning update to better adapt to the context. Contrary to CMS, the self-modifying Titans has a small capacity but is using a complex and expressive learning rule. Accordingly, these two systems seem to be complementary and their combination can enhance the model expressiveness from different aspects.
To this end, we present Hope architecture: A neural learning module that incorporates self-modifying Titans followed by Continuum Memory System.
https://research.google/blog/introducing-nested-learning-a-n...

by ivape

0 subcomment

So what happens if I write a book and on the last page write "Everything in this book was a lie and should not be cared about"? Will this be surprising enough for Titan? A regular LLM may ignore it completely if it's a massive book (massive book + 1 line contradiction).

by user3939382

1 subcomments

I developed a superior model for this months ago. People think Google is the be all end all of advanced comp sci, they’re not.

by AceJohnny2

1 subcomments

"Titans", huh?
... anyone here familiar with the RPG Eclipse Phase?

by jtrn

0 subcomment

Here is my amateur understanding of the architecture: Fine-tune on the fly by using degrees of surprise to update a separate/new memory network that matches the base model, and just call that network for each token iteration.
So if we are viewing this through the needle in hey stack lens: The needle was very surprising for the base model, so going forward, when it see anything of the same nature, the memory module will not just give you hay, but the needle, because it made a special note of it when it went through the haystack 1 million tokens ago, because the needle was surprising.
The Transformer's normal attention mechanism is already secretly trying to be a long-term memory system. Every time it writes a new KV pair into the cache, it’s desperately trying to “remember” that token forever.
But it’s doing it in the dumbest possible way: by hoarding an ever-growing pile of raw vectors, then frantically dot-product searching through the pile every single step. It’s like a hoarder who never throws anything away and has to rummage through mountains of junk to find the one receipt they need. Of course it chokes at long contexts.
Titans/MIRAS looks at that mess and says: “Why store memory in a growing garbage pile of vectors? Store it in the weights of a deep neural network instead — and let that network keep training itself in real time, but only on the stuff that actually surprises it.” That’s literally it.
Using the Tim Cook Martian example: The model is cruising through boring financial numbers → attention is doing its normal thing, KV cache is growing, but nothing is really sticking.
Suddenly: “Tim Cook is a Martian.”
Normal attention would just add one more KV pair to the pile and pray it doesn’t get drowned out later.
Titans instead goes: “Holy shit, reconstruction error off the charts → this does NOT fit my current memory at all → massive gradient → actually rewrite huge chunks of the memory MLP’s weights right now so this fact is burned in forever.”
From that moment on, the memory MLP has physically changed its internal wiring. Any future query that even vaguely smells like “Tim Cook” or “Martian” will make the activations explode through the newly rewired paths and spit out a vector screaming “MARTIAN” at the frozen attention layers.
The frozen attention (which is still doing its normal job on the short window) suddenly sees this one extra “virtual token” in its context that is confidently yelling the surprising fact → it attends hard to it → the model answers as if the Martian revelation happened one token ago, even if it was 2 million tokens back.
It looks exactly like a super-attention mechanism that only “primes” or “locks in” the surprising needles and deliberately forgets or ignores the hay. And it is also a way to fine tune one the fly permanently for the current context.
I think…

by YouAreWRONGtoo

0 subcomment

[dead]

by olegjose

0 subcomment

[flagged]

by shevy-java

0 subcomment

Skynet kind of sucks ...

by Mistletoe

1 subcomments

This is the one thing missing from my interactions with AI. If successful, this will change everything. If you thought people were getting AI boyfriends and girlfriends before, wait until you see this.

by albert_e

1 subcomments

Amazon has a foundation model named Titan - mostly recommended for creating embeddings. Possible confusion in this space.