FRESH

Hacker News

Home

15 years of FP64 segmentation, and why the Blackwell Ultra breaks the pattern

201 points by fp64enjoyer

by wtallis

6 subcomments

It's amazing to step back and look at how much of NVIDIA's success has come from unforeseen directions. For their original purpose of making graphics chips, the consumer vs pro divide was all about CAD support and optional OpenGL features that games didn't use. Programmable shaders were added for the sake of graphics rendering needs, but ended up spawning the whole GPGPU concept, which NVIDIA reacted to very well with the creation and promotion of CUDA. GPUs have FP64 capabilities in the first place because back when GPGPU first started happening, it was all about traditional HPC workloads like numerical solutions to PDEs.
Fast forward several years, and the cryptocurrency craze drove up GPU prices for many years without even touching the floating-point capabilities. Now, FP64 is out because of ML, a field that's almost unrecognizable compared to where it was during the first few years of CUDA's existence.
NVIDIA has been very lucky over the course of their history, but have also done a great job of reacting to new workloads and use cases. But those shifts have definitely created some awkward moments where their existing strategies and roadmaps have been upturned.

by adrian_b

2 subcomments

While implementing double-precision by double-single may be a solution in some cases, the article fails to mention the overflow/underflow problem, which is critical in scientific/technical computing (a.k.a. HPC).
With the method from the article, the exponent range remains the same as in single precision, instead of being increased to that of double precision.
There are a lot of applications for which such an exponent range would cause far too frequent overflows and underflows. This could be avoided by introducing a lot of carefully-chosen scaling factors in all formulae, but this tedious work would remove the main advantage of floating-point arithmetic, i.e. the reason why computations are not done in fixed-point.
The general solution of this problem is to emulate double-precision with 3 numbers, 2 FP32 for the significand and a third number for the exponent, either a FP number or an integer number, depending on which format is more convenient for a given GPU.
This is possible, but it lowers considerably the achievable ratio between emulated FP64 throughput and hardware FP32 throughput, but the ratio is still better than the vendor-enforced 1:64 ratio.
Nevertheless, for now any small business or individual user can achieve a much better performance per dollar for FP64 throughput by buying Intel Battlemage GPUs, which have a 1:8 FP64/FP32 throughput ratio. This is much better than you can achieve by emulating FP64 on NVIDIA or AMD GPUs.
Intel B580 is a small GPU, so it has only a FP64 throughput about equal to a Ryzen 9 9900X and smaller than a Ryzen 9 9950X. However it provides that throughput at a much lower price. Thus if you start with a PC with a 9900X/9950X, you can double or almost double the FP64 throughput for a low additional price with an Intel GPU. Multiple GPUs will proportionally multiply the throughput.
The sad part is that with the current Intel CEO and with NVIDIA being a shareholder of Intel, it is unclear whether Intel will continue to compete in the GPU market, or they will abandon it, leaving us at the mercy of NVIDIA and AMD, which both refuse to provide products with good FP64 support to small businesses and individual users.

by throwaway81523

2 subcomments

No mention of the Radeon VII from 2019 where for some unfathomable reason AMD forgot about the segmentation scam and put real FP64 into a gaming GPU. From this 2023 list, it's still faster at FP64 than any other consumer GPU by a wide margin (enterprise GPU's aren't in the list). Scroll all the way to the end.
https://www.eatyourbytes.com/list-of-gpus-by-processing-powe...

by gdiamos

6 subcomments

I'm not sure why the article dismisses cost.
Let's say X=10% of the GPU area (~75mm^2) is dedicated to FP32 SIMD units. Assume FP64 units are ~2-4x bigger. That would be 150-300mm^2, a huge amount of area that would increase the price per GPU. You may not agree with these assumptions. Feel free to change them. It is an overhead that is replicated per core. Why would gamers want to pay for any features they don't use?
Not to say there isn't market segmentation going on, but FP64 cost is higher for massively parallel processors than it was in the days of high frequency single core CPUs.

by jjmarr

4 subcomments

FP64 performance is limited on consumer because the US government deems it important to nuclear weapons research.
Past a certain threshold of FP64 throughput, your chip goes in a separate category and is subject to more regulation about who you can sell to and know-your-customer. FP32 does not matter for this threshold.
https://en.wikipedia.org/wiki/Adjusted_Peak_Performance
It is not a market segmentation tactic and has been around since 2006. It's part of the mind-numbing annual export control training I get to take.

by SubiculumCode

1 subcomments

To me it is crazy that NVIDIA somehow got away with telling owners of consumer grade hardware.that they cannot be used in datacenters.

by Sweepi

0 subcomment

Table to compare Blackwell U300 to U200 (-97% FP64 performance): https://www.forum-3dcenter.org/vbulletin/showpost.php?p=1380...

by juleiie

0 subcomment

I hope for their fall. I invest in their success

by numbers_guy

3 subcomments

A question that has been bugging me for a while is what will NVIDIA do with its HPC business? By HPC I mean clusters intended for non-AI related workloads. Are they going to cater to them separetely, or are they going to tell them to just emulate FP64?

by kittbuilds

0 subcomment

[dead]

by nicggckgcbnn

0 subcomment

[flagged]

by nicggckgcbnn

0 subcomment

[flagged]

by benreesman

1 subcomments

[flagged]

by botusaurus

1 subcomments

this article is so dumb. NVIDIA delivered what the market wanted - gamers dont need FP64, they dont waste silicon on it. now enterprise doesnt want FP64 anymore and they are reducing silicon for it too
weird way to frame delivering exactly what the consumer wants as a big market segmentation fuck the user conspiracy