I don't think it's possible to apply this trick to 64-bit floats on 64-bit architecture, which OP mentions in the last sentence. You need a 52 x 52 -> 104 product. Modular 64 x 64 -> 64 multiplication gives you the 64 bottom bits exactly, widening 32 x 32 -> 64 multiplication approximately gives you the top 32 bits. That leaves 104 - 64 - 32 = 8 bits that are not accounted for at all. Compare with the 32-bit case, where the same arithmetic gives 46 - 32 - 16 = -2, i.e. a 2-bit overlap the method relies on.
by awjlogan
0 subcomment
Link to Mark Owen’s excellent QFP library for soft float on Cortex-M0+ (ARMv6-M) and Cortex-M3/M4 (ARMv7-M).
Nice write up here, too, I like the idea of a firm float.
by mysterydip
1 subcomments
I appreciate this warning near the top: This post contains floating point. Floating point is known to the State of California to cause confusion and a fear response in mammalian bipeds.
I wisely hit the back button :)
by NooneAtAll3
1 subcomments
Since the trick works on mantissa only without hidden 1 included, I wonder if that number of bits in mantissa was chosen because of it