- While the search feature is nice, the reference itself still lacks some details about what an instruction actually does. Take for example, [1], and compare it with say [2] (with diagram), [3] (ditto), or [4] (only pseudocode but helpful nonetheless). Of course, all the alternatives mentioned only cater to x86 but still it'd be great if this site also follows the approach taken by the other three.
[1]: https://simd.info/c_intrinsic/_mm256_permute_pd
[2]: https://www.felixcloutier.com/x86/vpermilpd
[3]: https://officedaytime.com/simd512e/simdimg/si.php?f=vpermilp...
[4]: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...
- The ISA extension tags are mostly incorrect. According to that web site, all SSE2, SSE3, SSSE3, and SSE4.1 intrinsics are part of SSE 4.2, and all FMA3 intrinsics are part of AVX2. BTW there’s one processor which supports AVX2 but lacks FMA3: https://en.wikipedia.org/wiki/List_of_VIA_Eden_microprocesso...
The search is less than ideal. Search for FMA, it will find multiple pages of NEON intrinsics, but no AMD64 like _mm256_fmadd_pd
- I clicked the “go” button just to see the typical format, and it gave… zero results. Because the example is “e.g. integer vector addition” and it doesn't strip away the “e.g.” part!
Apart from that, I find the search results too sparse (doesn't contain the prototype) and the result page too verbose (way too much fluff in the description, and way too much setup in the example; honestly, who cares about <stdio.h>[1]), so I'll probably stick to the existing x86/Arm references.
[1] Also, the contrast is set so low that I literally cannot read all of the example.
- Neat idea, the 'search' feature is a bit odd though if you don't know which instruction you are looking for. e.g. searching for 'SHA' shows the autocomplete for platforms not selected and then 0 results due to the filters (they haven't been added for SSE/AVX yet), but searching for 'hash' gets you 100 results like '_mm256_castsi256_ph' which has nothing to do with the search.
- Neat tool.
It is interesting how often SIMD stuff is discussed on here. Are people really directly dealing with SIMD calls a lot?
I get the draw -- this sort of to-the-metal hyper-optimization is legitimately fun and intellectually rewarding -- but I suspect that in the overwhelming majority of cases simply using the appropriate library, ideally one that is cross-platform and utilizes what SIMD a given target hosts, is a far better choice than bothering with the esoterica or every platform and generation of SIMD offerings.
by fancyfredbot
1 subcomments
- The link to SIMD.AI is interesting. I didn't have a perfect experience trying to get Claude to convert a scalar code to AVX512.
Claude seems to enjoy storing 16 bit masks in 512 bit vectors but the compiler will find that easily.
The biggest issue I encountered was that when converting nested if statements into mask operations, it would frequently forget to and the inner and outer mask together.
by mshockwave
1 subcomments
- This is pretty useful! Any plan for adding ARM SVE and RISC-V V extension?
by varispeed
1 subcomments
- SIMD from MCUs would also be awesome!