FRESH

Hacker News

Understanding Transformers Using a Minimal Example

285 points by rttti

by CGMthrowaway

8 subcomments

Honest feedback - I was really excited when I read the opening. However, I did not come away from this without a greater understanding than I already had.
For reference, my initial understanding was somewhat low: basically I know a) what embedding is basically b) transformers work by matrix multiplication, and c) it's something like a multi-threaded Markov chain generator with the benefit of prior-trained embeddings

by meindnoch

0 subcomment

by runamuck

1 subcomments

I love how you represent each token in the form of five stacked boxes, with height, weight etc. depicting different values. Where did you get this amazing idea? I will "steal" it for plotting high dimensionality data.

by busymom0

1 subcomments

I'd also recommend another article on this topic of LLMs discussed a few days ago. I read it to the finish line and understood everything fully:
> How can AI ID a cat?
https://news.ycombinator.com/item?id=44964800

by neuroelectron

1 subcomments

For me, I feel like this could use a little bit more explanation. It's brief and the grammar or cadence is very clunky.

by dpflan

0 subcomment

Here is another take on visualizing transformers from Georgia Tech researchers: https://poloclub.github.io/transformer-explainer/
Also, the Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/
Also, this HN comment has numerous resources: https://news.ycombinator.com/item?id=35712334

by curtisszmania

0 subcomment

by aabdel0181

0 subcomment