About the Superposition paper - this is close to what I've been thinking about over the past week. I'm thinking that concepts or choices in a "superposition" are harder for a fully-differentiable neural net to reason about. For example, if there's a "green" vs "purple" choice to be made, it can't fully commit to either (especially if they're 50-50), and will have to reason about both simultaneously (difficult due to nonlinear manifold space). Discretizing to tokens (non-differentiable argmax) forces a choice, and that allows it to reason about a single concept separately and easier.
> Responses to the query “Write a metaphor about time” clustered by applying PCA to reduce sentence embeddings to two dimensions. […] The responses form just two primary clusters: a dominant cluster on the left centered on the metaphor “time is a river,” and a smaller cluster on the right revolving around variations of “time is a weaver.”
I just gave Gemini 3 the same prompt and got something quite different:
>Time is a patient wind against the cliff face of memory. It does not strike with a hammer to break us; it simply breathes, grain by grain, until the sharp edges of grief are smoothed into rolling hills, and the names we thought were carved in stone are weathered into soft whispers.
These days, abstracts are so marketing/advertising forward that it's hard to even understand the claim.