> collection of non governors really exhibiting self service
Assume an English-like active vocabulary V = 50,000 word types (a rough stand-in for “distinct words” seen commonly). We could get a realistic guess of: ~30% reduction for a less modest, more aggressive embedding-style collapse in typical English text. I.e. Collapse words with similar meaning directions in vector space... happy, glad, pleased, delighted → happy