- I think this sentence:
> But matrix multiplication, to which our civilization is now devoting so many of its marginal resources, has all the elegance of a man hammering a nail into a board.
is the most interesting one.
A man hammering a nail into a board can be both beautiful and elegant! If you've ever seen someone effortlessly hammer nail after nail into wood without having to think hardly at all about what they're doing, you've seen a master craftsman at work. Speaking as a numerical analyst, I'd say a well multiplied matrix is much the same. There is much that goes into how deftly a matrix might be multiplied. And just as someone can hammer a nail poorly, so too can a matrix be multiplied poorly. I would say the matrices being multiplied in service of training LLMs are not a particularly beautiful example of what matrix multiplication has to offer. The fast Fourier transform viewed as a sparse matrix factorization of the DFT and its concomitant properties of numerical stability might be a better candidate.
by janalsncm
5 subcomments
- > But matrix multiplication, to which our civilization is now devoting so many of its marginal resources, has all the elegance of a man hammering a nail into a board.
Elegance is a silly critique. Imagine instead we were spending trillions on floral bouquets, calligraphy, and porcelain tea sets. I would argue that would be a bad allocation of resources.
What matters to me is whether it solves the problems we have. Not how elegant we are in doing so. And to the extent AI fails to do that, I think those are valid critiques. Not how elegant it is.
- The computations in transformers are actually generalized tensor tensor contractions implemented as matrix multiplications. Their efficient implementation in gpu hardware involves many algebraic gems and is a work of art. You can have a taste of the complexity involved in their design in this Youtube video: https://www.youtube.com/live/ufa4pmBOBT8
- The commutation problem has nothing to do with matrices. Rotations in space do not commute, and that will be the case whether you represent them as matrices or in some other way.
- More like watching a weaving machine than watching a person hammer nails imo. Maybe like an old-time mill, with several machines if you think in terms of actual processing on an accelerator?
There's a wooden weaving machine at a heritage museum near me that gives me the same 'taste' in my brain as thinking about 'matrix' processing in a TPU or whatever.
by laichzeit0
1 subcomments
- Well function composition f(g(x)) is not the same as g(f(x)) and when you represent f and g as matrices relative to some suitable set of basis functions then obviously AB and BA should be different. If the multiplication was defined any different, that wouldn’t work.
by snickerbockers
0 subcomment
- Maybe I'm just being ai-phobic or whatever but I strongly suspect the original article is written by grok based on how it goes off on bizarre tangents describing extremely complicated metaphors that are not only inaccurate but also wouldn't in any way be insightful even if they were accurate.
- I think a much more insightful discussion would be to ask, why matrix multiplication is so much more useful than Hadamars product: https://en.wikipedia.org/wiki/Hadamard_product_(matrices)
Hadamars product/elementwise multiplication is also commutative.
- So, Hardy focused on good explanations, and that was what he meant by beauty. Fair enough. The best objective definition of beauty I know of is "communication across a gap". This covers flowers, mathematics, and all kinds of art, including art I think is ugly such as Lucian Freud and Hans Giger I guess. So now I'm describing some things as beautiful and ugly at the same time, which betrays that there's a relative component to it (relative, objectively). That means I wish some things - including mathematics, which is usually tedious - communicated better, or explained things that seem to me to matter more: I feel in my gut that there's potential for this. So I don't rate mathematics as beautiful, any of it, personally.
But I'll admit its barely beautiful. Within which context, I guess the article's lawyering for the relative beauty of a matrix was a success, but I always liked them better than calculus or group theory anyway.
- Quaternions are beautiful too until you sit down to multiply them.
- Beauty,symmetry,etc are largely irrelevant,
the key point it does not scale and burning
gigawatts to compute these matrices(even with all those tricks)
will not scale or compete with more efficient/direct methods
in the long term. Perhaps transformers are
very elaborate sunk-cost fallacy where pivoting to
scalable, simpler architecture is treated as "too risky"
even when cost of new GPU cluster dwarfs whatever it
takes to bring an architecture from 0 to chatGPT level.
by algernonramone
1 subcomments
- I am willing to admit that I find matrix multiplication ugly, as well as non-intuitive. But, I am also willing to admit that my finding it ugly is likely a result of my relative mathematical immaturity (despite my BS in math).
by ComplexSystems
0 subcomment
- Matrices represent linear transformations. Linear transformations are very natural and "beautiful" things. They are also very clearly not commutative: f(g(x)) is not the same as g(f(x)). The matrix algebra perfectly represents all of this, and as a result, FGx is not the same as GFx. It's only not "beautiful" if you believe that matrix multiplication is a random operation that exists for no reason.
by peterfirefly
1 subcomments
- I just finished reading lots of Stephen Witt quotes on goodreads. He comes across as a white Malcolm Gladwell, except that he actually does know what "Igon values" are so I don't know what his excuse is.
by Scene_Cast2
0 subcomment
- Matmuls (and GEMM) are a hardware-friendly way to stuff a lot of FLOPS into an operation. They also happen to be really useful as a constant-step discrete version of applying a mapping to a 1D scalar field.
I've mentioned it before, but I'd love for sparse operations to be more widespread in HPC hardware and software.
- I guess Stephen Witt must not like subtraction either, since a-b =/= b-a. Nor division.
- Matrix multiplication is not ugly, but matrices themselves are ugly, mainly because they encode the arbitrary operation of choosing a basis. There's nothing especially nice about the pixel basis for images, or about the token basis for language. But of all the things that make up modern deep learning, matrix multiplication is surely the _least_ ugly. Relu/gelu is not pretty! Batch normalization is vomit-inducing!! Imagenet normalization? JFC!!!
- Don't like matrices? Introducing: Penrose abstract index notation. Or "I can't believe it's not matrices".
by jamespropp
8 subcomments
- Do you disagree with my take or think I’m missing Witt’s point? I’d be happy to hear from people who disagree with me.
- I think it is just a matter of perspective. You can both be right. I don't think there is an objective answer to this question.
- Honestly, in a purely technical sense, I do find it beautiful how you can take matrix multiplication and a shit-ton of data, and get a program that can talk to you, solve problems, and generate believable speech and imagery.
There are many complications arising from such a thing existing, and from what was needed to bring it into existence (and at the cost of whom), I'll never deny that. I just can't comprehend how someone can find the technical aspects repulsive in isolation.
It feels a lot like trying to convince someone that nuclear weapons are bad by defending that splitting an atom is akin to banging a rock against a coconut to split it in two.
- No. It's not ugly.
by stackghost
0 subcomment
- >Matrix algebra is the language of symmetry and transformation, and the fact that a followed by b differs from b followed by a is no surprise; to expect the two transformations to coincide is to seek symmetry in the wrong place — like judging a dog’s beauty by whether its tail resembles its head.
The way I've always explained this to non-algebra people is to imagine driving in a city downtown. If you're at an intersection and you turn right, then left at the next intersection, you'll end up at a completely different spot than if you were to instead turn left and then right.
- maybe the issue boils down to overloading the term "multiplication". If mathematicians instead invented a new word here, people would get tripped up less (similarly for 'dot' and 'cross' "products")
i think a lot of issues arise from using analogies. Another one us complex numbers as 2D vectors. Its an ok analogy.. Except complex numbers can be multiplied where are 2D coordinates can not. Your weird new nonvectors are now spinning and people are left confused
- IIRC, working with matrices was much easier using FORTAN, I would expect modern fortran kept that 'easiness'.
- Matrix multiplication libraries are ugly. They either give up on performance or have atrocious interfaces ... sometimes both.
Using matrix multiplication is also ugly when it's literally millions of times less efficient then a proper solution.
- Anyone who thinks matrix multiplication is ugly has understood nothing about it.
- I doubt anyone of the past or present could fully describe what a matrix is, and what its multiplication is. There are many ways people looked at it so far - as a spatial transformation, dot products and so on. I don't think the description is complete in any significant way.
That's because we don't fully understand what a number is and what a multiplication is. We defined -x and 1/x as inverses (additive and multiplicative), but what is -1/x ? Let's consider them as operations. Apply any one of them on any other of them, you get the third one. Thus they occupy peer status. But we hardly ever talked about -1/x.
The mathematical inquisition is in its infancy.