FRESH

Hacker News

Home

X For You Feed Algorithm

121 points by grainier

by swyx

2 subcomments

ooh, LLM Recsys alert! (we had an LLM Recsys track at ai.engineer last year). official announcement here: https://x.com/XEng/status/2013471689087086804
looks like this is the "for you" feed, once again shared without weights so we only have so much visibility into the actual influence of each trait.
"We have eliminated every single hand-engineered feature and most heuristics from the system. The Grok-based transformer does all the heavy lifting by understanding your engagement history (what you liked, replied to, shared, etc.) and using that to determine what content is relevant to you." aka it's a black box now.
the README is actually pretty nice, would recommend reading this. it doesnt look too different form Elon's original code review tweet/picture https://x.com/elonmusk/status/1593899029531803649?lang=en
sharing additonal notes while diving through the source: https://deepwiki.com/xai-org/x-algorithm
and a codemap of the signal generation pipeline: https://deepwiki.com/search/make-a-map-of-all-the-signals_3d...
- Phoenix (out of network) ranker seems to have all the interesting predictive ML work. it estimates P(favorite), P(reply), P(repost), P(quote), P(click), P(video_view), P(share), P(follow_author), P(not_interested), P(block_author), P(mute_author), P(report) independently and then the `WeightedScorer` combines them using configurable weights. there's an extra DiversityScore and OONScore to add some adjustments but again dont know the weights https://deepwiki.com/xai-org/x-algorithm/4.1-phoenix-candida... - other scores of interest: photo_expand_score, and dwell_score and dwell_time. share via copy, share, and share via dm are all obviously "super like" buttons.
- Two-Tower retrieval uses dot product similarity between user features/engagement (User Tower) and normalized embeddings for all items (Candidate Tower). but when you look into the code and considering that this is probably the most important model for recommendations quality.... it's maybe a little disappointing that its a 2 layer MLP? https://deepwiki.com/search/what-models-are-used-for-user_98...
- Grok-1 JAX transformer (https://github.com/xai-org/x-algorithm/blob/main/phoenix/REA...) uses special attention masking that prevents candidates from attending to each other during inference. Each candidate only attends to the user context (engagement history). This ensures a candidate's score is independent of which other candidates are in the batch, enabling score consistency and caching. nice image here https://github.com/xai-org/x-algorithm/blob/main/phoenix/REA...
- kind of nice usage of Rust traits to create a type safe data pipeline. look at this beautiful flow chart https://deepwiki.com/xai-org/x-algorithm/3-candidate-pipelin... and the "Field Ownership pattern" https://deepwiki.com/xai-org/x-algorithm/3.6-scorer-trait#fi...
- the ten pre-scoring filters are minorly interesting, nothing super surprising here apart from AgeFilter (https://deepwiki.com/xai-org/x-algorithm/4.6.1-agefilter) which I guess means beyond a certain max_age (1 day?) nothing ever shows up on For You. surprising to have a simple flat cutoff vs i guess the alternative of an exponential aging algorithm.
- videoduration hydrator explicitly prioritizes video duration (https://deepwiki.com/xai-org/x-algorithm/4.5.6-videoduration...) but we dont know in what direction... do you recommend shorter or longer videos? and why a hydrator for what is presumably a pretty static property?
open questions from me
1. how large is the production reranker? default param count is here https://deepwiki.com/search/how-many-params-is-the-transfo_c... but that gives no indication. the latency felt ultra high initially last year and seems to have come down some, what budget are we working with?
2. can we make the retrieval better? i dont have a tooon of confidence in the User Tower / Candidate Tower system - is this SOTA (it's probably not - see how youtube does codebook semantic id's https://www.youtube.com/watch?v=LxQsQ3vZDqo&list=PLcfpQ4tk2k... )
3. no a/b testing / rollout infrastructure?
4. so many hydration subsystems - is this brittle?

by rapsey

1 subcomments

I did not expect to see Rust. They seem to have forgotten to commit Cargo.toml though.
Oh I see it is not meant to be built really. Some code is omitted.

by roryirvine

1 subcomments

I wonder if this'll turn out like the last time they published their algorithm to great fanfare, and then didn't bother to ever update it: https://github.com/twitter/the-algorithm

by internetter

2 subcomments

what is the difference between this and https://github.com/twitter/the-algorithm

by binsquare

1 subcomments

Hasn't this become more of a blackbox now that it's grok-based? And we've seen grok responses can be actively tweaked whenever Elon doesn't like it?
I'm sure there's many examples but here's the first Google search result: https://www.theguardian.com/us-news/2025/nov/12/elon-musk-gr...

by kklisura

5 subcomments

Err... for me: that's shockingly small amount of code. I don't think there's over 5k of LOC there.

by espeed

0 subcomment

Have we entered the age of AI programming people?

by stickynotememo

2 subcomments

> Grok based transformer
Is Grok not an LLM? Or do they have other models under that brand?

by moneywoes

0 subcomment

anything interesting? anything that is a surprise?

by ulrischa

1 subcomments

Can someone port this to a bluesky custom feed?

by dotandgtfo

1 subcomments

This clearly has the goal of muddying the water of the DSA transparency requirements. It's an opaque way of trying to mislead users into believing that X is being transparent while not being so at all.
They pretend to be transparent about their algorithms while denying researchers access to their API through exorbitant pricing and severely limited quotas.

by chistev

8 subcomments

By releasing these things are they giving their competitors an advantage??
Someone explain.

by NedF

0 subcomment

[dead]

by wraptile

6 subcomments

I feel like we need more awareness on what is open-source and how does it work. This is NOT open source. This is, at best, source available but as there is no way to confirm that this code even runs anywhere ever it's entirely a bad faith performance to trick people, deceive regulators and stain the entire open source movement.
I sincerely hope that the main stream media does not fall for this and calls it out. It's not rocket science. It's really really simple - this is not good for anyone.

by searine

1 subcomments

[flagged]