Some of the words chosen are rather absurd/inappropriate: breviary (which I got wrong but felt like a vaguely religious word) was characterized as intermediate but I think it's much more obscure and less obvious than that; Hippopotomonstrosesquippedaliophobia was used as a word (I got that wrong as well) - any type of 'phobia' word is really the sort of thing a fourth grader opens up a page in the dictionary and points out, not a word that is used... ever; metamorphosis and kinetic were labeled expert, which I don't agree with (what elementary schooler doesn't learn about the metamorphosis of a caterpillar into a butterfly? what high schooler doesn't learn about kinetic energy?).
Most words were reasonably well defined in a way that most people would understand or recognize. A few words had poor definitions: lethargy ("the state of being lethargic" - obvious); complacent ("smug satisfaction with oneself" - I disagree that complacency is intrinsically smug); magnanimous ("generous toward a rival" - I disagree that a rival must be involved); gauche ("socially awkward" - this is sort of close but the given definition completely misses the idea of being tactless).
They call it scientific and give a hand-wavey formula, but they don't explain how words are stratified in the first place. If stratified sampling is a formally recognized method of doing this, it would be nice to have a link to a real reference. I think I know a lot of words, but I am skeptical of the estimate this app provided (north of 75k).
I've seen other systems like this calibrate far more quickly by assigning a sort of score and confidence behind the scenes. Confidence starts out low and increases over time - correct/incorrect answers rapidly adjust score at the beginning, then things settle down.
In practice this means you get a sequence of increasingly uncommon words initially, until you get one wrong, then you drop back to something easier until you start getting things right again, and eventually circle around words at your level.
Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick (or add an undo button).
I got credit for a few that I would have happily just missed.
(context: native English speaker, big reader, huge nerd, perfect SAT score)
I got all 100 correct on the first try without looking anything up! Confusingly, that only resulted in a "SCIENTIFIC ESTIMATE" that I know 85,000/~170,000 words?
Their "How is this calculated" page that appears at the end explains their error:
> According to the Oxford English Dictionary (Second Edition), there are approximately 171,476 words in current use.
> We use Stratified Sampling. Instead of testing random words, we divide the language into 5 distinct difficulty bands based on frequency of use:
> 1. Core Basics ~3,000 words > 2. Intermediate ~7,000 words > 3. Advanced ~10,000 words > 4. Expert ~25,000 words > 5. The Obscure ~40,000+ words
> If you answer 2 out of 3 'Intermediate' questions correctly, we estimate you know roughly 66% of the 7,000 words in that band.
> Total Score = Σ (Accuracy in Band × Band Size)
Their strata add up to 85000, not ~170k, making a perfect score still give a 50%.
They're also using a pretty limited and perhaps non-difficulty-representative subset of the language.
Cute, but wrong on many counts.
Like if author used LLM to generate wrong definitions per word instead of actually mixing definitions of words.
Like for me most of more complex words been adjectives with few nouns. And in many cases you can just see 2/4 or 3/4 definitions are not for adjective.
I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so. Having to click on submit twice really breaks the flow.
Also in all the words I tried I noticed out of the 4 options one is the correct one, another is the opposite of the correct one, and the other 2 are random stuff. You can basically skip any option whose antonym isn't present as well.
But many of the hard words were quite similar to more common words we have here.
I do wonder how much of these were “what AI thinks are hard words to know” vs. actually hard to know.
Core Basics 19/20
Intermediate 17/20
Advanced 19/20
Expert 14/20
Grandmaster 12/20
I guess, it's not too bad for a non-native speaker.
Minor feedback:
1. The correct answer for "Lethargic" is "Affected by lethargy". I think, definitions should not use words that share common root with the defined word, because:
a. it makes guessing too easy
b. it basically becomes a circular definition which is meaningless
2. Options almost always include 1 correct answer, 1 direct opposite and 2 completely random. Once you learn to recognise it, you can easily rule out 2 random options and have a 50/50 guess.
As a non-native English speaker, I found that result pretty good! Though being a native Portuguese speaker certainly helped me as many difficult words in English borrow from Latin, and in Portuguese the Latin influence is more pronounced.
My shorter OED contains 163,000 words (compared to the 600,000 words of the longer).
According to this site I know 71,000 words... Let's test that against the OED. I should have about 43% chance if knowing a word picked at random.
In my totally scientific test (ha) I chose 50 words at random from the OED and discovered I knew 29 of them for a score of 58% which is more than two sigma from 43%, this disproving the hypothesis.
I forgot what that was now, but it was a fun experiment.
I used to do this in school tests too.
If you force me to guess, then I'm going to guess. Not only does that give me a 25% chance of getting it right at random, but as others have pointed out, it is very hard to make a multiple choice question that isn't guessable by an astute enough test taker. I think I knew 80 - 85 of those words, but I scored 97, because those questions were very guessable.
Also, reiterating everyone else's comments with respect to the UX needing fewer clicks, and also the definitions not being exact or precise in many cases.
Scientific Estimate: 69 100 word
It began very simple, so that I took it not very serious for a moment, but I never heard many of the later words. But thanks to knowing some latin and other languages, I could understand many of them.
A fun idea!
Test could be completed in 1/5 of the time if the user could use numeral keys [1, 2, 3, 4] plus "enter" to input selections instead of the cursor.
But then below it said "you are a man of few words".
I take it the latter is just because I've only done the test once? But it's mixed messaging on first attempt I think.
Same strategies apply for guessing the unknown especially with a modicum(it was on the test!) of Latin knowledge..
Strange that pretty every one here is getting 70k estimates (93/100 for me).
Feels a bit high at least for me as a non-native speaker.
I got 2 words I knew wrong, and guessed about 5 unknown words correctly. Those were bizarre repetitive words I've never seen before.
I remember doing a similar test from a reputable university about 10-15 years ago also in an app format and only got about 30k estimate.
I'm curious how the difficult is chosen because "obfuscate" was included in the hardest difficulty but I would not consider that to me a difficult word.
Also I found that some of the definitions were not completely correct.
It told me to read the dictionary.
I'm not sure exactly how you did this, but I think you asked an LLM to come up with the wrong options. Two things to consider:
1. While the LLM can go r good options, they won't be always hard to guess. I wonder if instead you can have the LLM generate very close words (or skip using an LLM entirely) and put those as the options. 2. If you will generate options with an LLM, make sure you are mindful of its inability to shuffle things around. The correct answer was overwhelmingly the first or second option in the list. You should ask the model to give the options in a uniform order (say from true meaning then decreasing amount of replayability), then manually shuffle them so that the probability of which option (A, B, C or D) is always 25%.
From what I can tell they actually have a bit more robust science behind their algorithm (and a lot less questions to answer)
1. Frame each option with one key (1,2,3,4). User press 2, select the second option
2. Let the user change options if they want until they press Enter. Enter submits the answer.
3. Once submitted, another Enter brings the next one
I do concur that a refined collection of incorrect proposed responses which includes selections among terms with semantic proximity, conflated synonyms and plausible morphology could refine the accuracy of evaluations; and if the test was intended to bestow authentic assessments of lexicographical capability this would in all probability become an efficacious approach, but as a simply presentable quiz for folks with sesquipedalian proclivities I was not unduly discomfited by anything moreso than the extraneous clicks leading to and following the display of dichotomous determinations.
I wonder if the test is calibrated to the fact that some answers are just well guessed? I am not a native English speaker, but I speak 3 languages overall and have basic notions in Latin, and I have to admit it helped a lot in "deciphering" a few words that I didn't know at all. And in at least 2 cases I just guessed correctly.
It would have paired well with an exposition of vanilla Monte Carlo and the benefits of stratified sampling.
Although stratified sampling is good, one can do better in this case by using adaptive sampling, where one uses a runtime (Bayesian) estimate of vocabulary to maximize information gain per question -- preferrentially sample from those strata where the current strata specific estimate has higher variance.
It's annoying that you need to click 3 times per question, and the buttons are in 2 different places.
Maybe would be better to just let me click the answer I want and then instantly show me the next question?
Also who is Sandi?
But to be honest many that might catch out a native speaker are just the Spanish/French/Latin word, so it was too easy in a way.
I scored 71,000.
Got 64,650: 20/19/17/18/12 (the intermediate one was a dumb mistake)
Some definitions were not great and alternatives a little silly at times but on the whole seemed pretty accurate.
Also probably needs calibrated as 96/100 was projected to 77k words, what would the estimate be for 100/100?
You are a person of few words, or perhaps just a mysterious one. Quite intriguing.”
—- This sounds more like a cute assessment of only getting two words right. And what do you mean “new words”? It wasn’t until eighty-odd words in that I actually got a word I didn’t know and had to guess by ruling out multiple-choice options.
But Candid can certainly mean secretive, as in “Candid camera”.
The green button (which should not exist) was also hidden under Firefox for Android's address bar until I tried to "scroll* to hide it.
Also add a keyboard focus state on the continue button.
Eh?
One suggestion would be more convincing decoy choices, some were pretty silly. But I have no idea how they come up with them.
Might I suggest adaptive difficulty? After getting 10, 15, 20 correct in a row it should scale up the difficulty immediately, rather than waiting for 100 in the basic level 1...
I suppose the words must be weighed, because other people in the thread with more correct words got a not much higher estimate.
I’m not sure how you’d gauge what knowing each word would indicate.
Also adequate options, that sound plausible.
Are accoutrement and ziggurat really English words? Accoutrement is even pronounced as French!
And it didn't even tell me at the end how many words I know!
There is a similar variant of such a test where you just go down a list of words of increasing obscurity, ticking the ones you are familiar with. If you do this once or twice, you can get a fairly good estimate of the actual number of words you know.
Probably not too bad for a person whose native language is not English.
Fun!
My score: 78,000 words, 20/20/19/18/18.
Level 0: Core Basics Abundant, Baffle, Candid, Dwell, Emerge, Frugal, Generic, Hinder, Impartial, Jovial, Knack, Lucid, Meager, Naive, Obsolete, Peculiar, Quench, Refute, Seldom, Tedious, Unique, Valid, Wary, Yearn, Zeal, Adequate, Barren, Coarse, Diligent, Esteem, Fickle, Gloom, Hoax, Ignite, Jolt, Keen, Linger, Mend, Numb, Omit, Pledge, Quota, Rural, Soothe, Toxic, Urge, Vow, Witty, Yield.
Level 1: Intermediate Acumen, Benevolent, Complacent, Dilapidated, Eloquent, Fabricate, Gregarious, Hypothetical, Imminent, Juxtapose, Lethargic, Meticulous, Nostalgia, Oblivious, Pragmatic, Reiterate, Scrutinize, Tentative, Ubiquitous, Verbose, Wane, Aesthetic, Bolster, Candor, Defer, Elicit, Furtive, Glut, Heed, Impeccable, Lament, Modicum, Notorious, Opulent, Plausible, Resilient, Stagnant, Trivial, Viable, Zenith.
Level 2: Advanced Alleviate, Breviary, Cacophony, Deferential, Ephemeral, Fastidious, Garrulous, Harangue, Iconoclast, Juggernaut, Laconic, Magnanimous, Nefarious, Obsequious, Paradigm, Recalcitrant, Sanguine, Taciturn, Ubiquity, Vacillate, Winsome, Zephyr, Abase, Banal, Capricious, Debilitate, Ebullient, Facetious, Gaikwar, Hackneyed, Idiosyncrasy, Jargon, Kindle, Labyrinth, Maverick, Narcissism, Ostracize, Palliate, Quagmire, Rancorous, Sagacity, Tantamount.
Level 3: Expert Abstemious, Bellicose, Chicanery, Deleterious, Enervate, Fatuous, Gauche, Hegemony, Inculcate, Jejune, Kowtow, Lugubrious, Mawkish, Nonsectarian, Obdurate, Pernicious, Quotidian, Recapitulate, Supercilious, Tempestuous, Unctuous, Vehement, Winnow, Xenophobe, Ziggurat, Acquiesce, Bombastic, Circumlocution, Desultory, Equinox, Fiduciary, Gerrymandering, Hubris, Incognito, Kinetic, Loquacious, Metamorphosis, Nihilism, Orthography, Precipitous, Quasar, Reparation, Soliloquy.
Level 4: Grandmaster (The Obscure) Accoutrement, Brobdingnagian, Crepuscular, Defenestrate, Equanimity, Flibbertigibbet, Grandiloquent, Hippopotomonstrosesquippedaliophobia, Ineffable, Jingoism, Kerfuffle, Logorrhea, Mellifluous, Obfuscate, Panacea, Quixotic, Rococo, Sesquipedalian, Tergiversate, Ultracrepidarian, Vicissitude, Weltschmerz, Xeric, Yclept, Zeitgeist, Absquatulate, Bumbershoot, Callipygian, Dord, Ergophobia, Fartlek, Gobbledygook, Houghmagandy, Interrobang, Kakistocracy, Lollygag, Mumpsimus, Nudiustertian, Omphaloskepsis, Pogonotrophy, Quire, Ratoon, Snollygoster, Tittynope, Ucalegon, Vagitus, Widdershins, Xylopolist, Yarborough, Zenzizenzizenzic.
"Verbose," for instance, is defined as "Using more words than are needed."
That's not exactly wrong, but it's kind of misleading. "Verbose" explicitly means using a large pile of words, drowning the reader in far more words than are strictly necessary.
"More words than are needed" could be as limited as "used a three-word construction in a sentence where it could have been one."
There are many more like this.
Please, I beg all of you - don't use LLMs to generate linguistic slop that claims to be linguistic education.
I weep for the world that is to come.
* Correct word * Opposite definition * Another word's definition * Opposite of that word's definition
Which massively reduces the difficulty
I mean, select the word, then press check, then press continue.
It could be one single click and move to the next, show me my last result at the same time you ask me for the next one.
Then I was doing poorly in grandmaster, until I realize you can ace grandmaster by just picking the longest explanation every time.
Vibe coders need to be forced to spend one day learning basic CSS before they're allowed to use an LLM to make a website and the internet would be a lot more pleasant as we move forward with slopification.. It doesn't have to be sloppy, and doesn't take all that much studying to at least be able to steer an llm in the right direction to make something look nice. At this point everything is just the same 3 colors and a centered flex column with weird spacing.
3 clicks per is what gives it away. and the little compliments. and that it's 100 questions
English is not my native language. I get my vocabulary from browsing the Internet. There is no way I know that many words.
I use the language to understand not get an effect