FRESH

Hacker News

Home

Building Your Own Efficient uint128 in C++

111 points by PaulHoule

by MarekKnapek

1 subcomments

On division: There is a paper about a division algorithm. If you split your 128bit integer into four 32bit integers, you can use a 64bit CPU built-in instruction to do the 128bit division. If you are on 32bit CPU that doesn't have the 32bit division operation, then split your 128bit number into 8 16bit integers and leverage the 32bit CPU built-in operation. https://surface.syr.edu/eecs_techreports/166/

by jfyi

2 subcomments

In highschool, age 14, I was taking a turbo pascal class.
The teacher had assigned us to pick a project to work on for 2 weeks.
I had picked replicating some basic dos application. I legitimately just wanted to know how it worked and get down to a lower level and reimplement it.
The teacher kept asserting "it already exists, pick something else". I was young and didn't really know how to explain it to her at the time and kept insisting on that project.
It turned into some big hoopla, dragging my parents into the principal where we sat and discussed options. They had no idea of what was going on. The teacher kept pushing implementing a card game, I think it was poker. I being completely unreasonable immediately latched onto, "why does it have to be gambling? Why not something useful?"
Neither I, nor my parents had anything against a poker game morally, but that really changed the tone of the conversation and we ultimately landed on a big int implementation.
She had legitimately thought it was some gotcha and was noticeably angry when I demoed it 2 weeks later. It was of course the most naive implementation imaginable but worked well beyond the expectations assigned.
I never told her I finished it in a couple hours.
Whenever I think about it, it just kills me how disfunctional that school was on so many levels. I suppose I was lucky though to have computer science classes at that age in the 90's.

by b1temy

2 subcomments

I understand why a non-standard compiler-specific implementation of int128 was not used (Besides being compiler specific, the point of the article is to walk through an implementation of it), but why use
> using u64 = unsigned long long;
? Although in practice, this is _usually_ an unsigned 64 bit integer, the C++ Standard does not technically guarantee this, all it says is that the type need to be _at least_ 64 bits. [0]
I would use std::uint64_t which guarantees a type of that size, provided it is supported. [1]
Re: Multiplication: regrouping our u64 digits
I am aware more advanced and faster algorithms exist, but I wonder if something simple like Karatsuba's Algorithm [2] which uses 3 multiplications instead of 4, could be a quick win for performance over the naive method used in the article. Though since it was mentioned that the compiler-specific unsigned 128 integers more closely resembles the ones created in the article, I suppose there must be a reason for that method to be used instead, or something I missed that makes this method unsuitable here.
Speaking of which, I would be interested to see how all these operations fair against compiler-specific implementations (as well as the comparisons between different compilers). [3]. The article only briefly mentioned their multiplication method is similar for the builtin `__uint128_t` [4], but did not go into detail or mention similarities/differences with their implementation of the other arithmetic operations.
[0] https://en.cppreference.com/w/cpp/language/types.html The official standard needs to be purchased, which is why I did not reference that. But it should be under the section basic.fundamental
[1] https://en.cppreference.com/w/cpp/types/integer.html
[2] https://en.wikipedia.org/wiki/Karatsuba_algorithm
[3] I suppose I could see for myself using godbolt, but I would like to see some commentary/discussion on this.
[4] And did not state for which compiler, though by context, I suppose it would be MSVC?

by nemo1618

0 subcomment

For those curious about division, I wrote a popular uint128 package in Go that uses one of the standard approaches: https://github.com/lukechampine/uint128/blob/3d2701e8e909405...

by throwaway81523

2 subcomments

GCC alread has this for x64, I thought. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html
RISC-V has no carry bit and this whole thing becomes awkward.
I am under the impression that boost::multiprecision has specialized templates for 128 and 256 bit math, but maybe I'm wrong. In practice when I've wanted extended precision, I've just used GMP or a language with bignums.
I would expect the best x86 machine code for many 128 bit operations would use XMM instructions, no? But I haven't investigated.

by asfaewafasd

1 subcomments

``` // this is the type used in the intrinsics signature // (and uint64_t is unsigned long, not unsigned long long...) using u64 = unsigned long long; ```
Can somebody explain to me why uint64_t (unsigned long according to the post) cannot be used here?

by ThatGuyRaion

12 subcomments

Question for those smarter than me: What is an application for an int128 type anyways? I've never personally needed it, and I laughed at RISC-V for emphasizing that early on rather than... standardizing packed SIMD.

by PaulHoule

1 subcomments

Makes me think of the bad old days where the platform gave you 8-bit ints and you built everything else yourself... or AVR-8.

by beached_whale

0 subcomment

I am so happy that MSVC added 128 bit integers to their standard library in order to do ranges distance of uint64_t iota views. One type alias away from int128's on most machines running gcc/clang/msvc

by reactordev

1 subcomments

Tangential. A long time ago at a company far far away, this is how we did UUIDs that made up a TenantId and a UserId, using this exact same logic, minus the arithmetic. Great stuff.
(We wanted something UUID like but deterministic that we could easily decompose and do RBAC with, this was prior to the invention of JWT’s, OAuth, and scopes, worked at the time).

by azhenley

3 subcomments

> we use 256-bit integers in our hot paths and go up to 564 bits for certain edge cases.
Why 564 bits? That’s 70.5 bytes.

by Joker_vD

1 subcomments

> On division: There is no neat codegen for division.
Wait, what? I'm fairly certain that you can do a 128-bit by 128-bit division using a x64's 128-bit by 64-bit division instruction that gives you only 64-bit quotient and remainder. The trick is to pre-multiply both dividend and divisor by a large enough power of 2 so that the "partial" quotient and remainders that the hardware instruction would need to produce will fit into 64 bits. On the whole, IIRC you need either 1 or 2 division instructions, depending on how large the divisor is (if it's too small, you need two divisions).