FRESH

Hacker News

How big are our embeddings now and why?

113 points by alexmolas

by Xenoamorphous

2 subcomments

Question for the experts: a few years back (even before covid times?) I was tasked with building a news aggregator. Universal Sentence Encoder was new, and we didn’t even have BERT back then. It felt magical (at least as a regular software dev) seeing how the cosine similarity score was heavily correlated with how similar (from a semantic standpoint) two given pieces of text were. That plus some clustering algorithm got the job done.
A few months ago I happened to play with OpenAI’s embeddings model (can’t remember which ones) and I was shocked to see that the cosine similarity of most texts was super close, even if the texts had nothing in common. It’s like the wide 0-1 range that USE (and later BERT) were giving me was compressed to perhaps a 0.2 one. Why is that? Does it mean those embeddings are not great for semantic similarity?

by minimaxir

1 subcomments

It's the same Jevons paradox reason as why LLMs are so big despite massive diminishing returns. If we can output 4096Ds, why not use all the Ds?
Like LLMs, the bottleneck is still training data and the training regimen, but there's still a demand for smaller embedding models due to both storage and compute concerns. EmbeddingGemma (https://huggingface.co/google/embeddinggemma-300m), released just yesterday, beats the 4096D Qwen-3 benchmarks at 768D, and using the 128D equivalent via MRL beats many 768D embedding models.

by numlocked

3 subcomments

I don’t quite understand. The article says things like:
“With the constant upward pressure on embedding sizes not limited by having to train models in-house, it’s not clear where we’ll slow down: Qwen-3, along with many others is already at 4096”
But aren’t embedding models separate from the LLMs? The size of attention heads in LLMs etc isn’t inherently connected to how a lab might train and release an embedding model. I don’t really understand why growth in LLM size fundamentally puts upward pressure on embedding size as they are not intrinsically connected.

by lmeyerov

0 subcomment

We have been on a fork for louie.ai:
* Small data - talk to your PDF on-the-fly etc: Getting bigger & faster via cloud APIs
* Big data - for RAG: Getting smaller, bc we don't want to pay crazy fees for vector DB hosting, and doable bc easier to get higher-quality small embeddings that do that

by bitwize

0 subcomment