FRESH

Hacker News

Huge Binaries

206 points by todsacerdoti

by yjftsjthsd-h

4 subcomments

> I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.
I am very sympathetic to wanting nice static binaries that can be shipped around as a single artifact[0], but... surely at some point we have to ask if it's worth it? If nothing else, that feels like a little bit of a code smell; surely if your actual executable code doesn't even fit in 2GB it's time to ask if that's really one binary's worth of code or if you're actually staring at like... a dozen applications that deserve to be separate? Or get over it the other way and accept that sometimes the single artifact you ship is a tarball / OCI image / EROFS image for systemd[1] to mount+run / self-extracting archive[2] / ...
[0] Seriously, one of my background projects right now is trying to figure out if it's really that hard to make fat ELF binaries.
[1] https://systemd.io/PORTABLE_SERVICES/
[2] https://justine.lol/ape.html > "PKZIP Executables Make Pretty Good Containers"

by 10000truths

2 subcomments

Debug symbol size shouldn't be influencing relocation jump distances - debug info has its own ELF section.
Regardless of whether you're FAANG or not, nothing you're running should require an executable with a 2 GB large .text section. If you're bumping into that limit, then your build process likely lacks dead code elimination in the linking step. You should be using LTO for release builds. Even the traditional solution (compile your object files with -ffunction-sections and link with --gc-sections) does a good job of culling dead code at function-level granularity.

by yablak

1 subcomments

> We would like to keep our small code-model. What other strategies can we pursue?
Move all the hot BBs near each other, right?
Facebook's solution: https://github.com/llvm/llvm-project/blob/main/bolt%2FREADME...
Google's:
https://lists.llvm.org/pipermail/llvm-dev/2019-September/135...

by meisel

0 subcomment

> Responses to my publication submissions often claimed such problems did not exist
I see this often even in communities of software engineers, where people who are unaware of certain limitations at scale will announce that the research is unnecessary

by stncls

2 subcomments

> The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute JMP.
Makes sense, but in the assembly output just after, there is not a single JMP instruction. Instead, CALL <immediate> is replaced with putting the address in a 64-bit register, then CALL <register>, which makes even more sense. But why mention the JMP thing then? Is it a mistake or am I missing something? (I know some calls are replaced by JMP, but that's done regardless of -mcmodel=large)

by loeg

2 subcomments

Sure! But there's a sleight of hand in the numbers here where we're talking about 25GB binaries with debuginfo and then 2GB maximum offsets in the .text section. Of those 25GB binaries, probably 24.5 of them are debuginfo. You have to get into truly huge binaries before >2GB calls become an issue.
(I wonder but have no particular insight into if LTO builds can do smarter things here -- most calls are local, but the handful of far calls can use the more expensive spelling.)

by doubletwoyou

4 subcomments

25 GiB for a single binary sounds horrifying
at some point surely some dynamic linking is warranted

by MaskRay

0 subcomment

by wyldfire

2 subcomments

> What other strategies can we pursue?
You can use thunks/trampolines. lld can make them for some architectures, presumably also for x86_64. Though I don't know why it didn't in your case.
But, like the large code model it can be expensive to add trampolines, both in icache performance and just execution if a trampoline is in a particularly hot path.

by shevy-java

0 subcomment

25GB seems excessive, but I keep on having the basic compile toolchain as statically compiled executables. It simply works better when things go awry.

0 subcomment

by a_t48

1 subcomments

I've seen terrible, terrible binary sizes with Eigen + debug symbols, due to how Eigen lazy evaluation works (I think). Every math expression ends up as a new template instantiation.

by reactordev

0 subcomment

Oh man, that first paragraph. “Such problems don’t exist…” what a gaslighting response to a publication submittal. The least they could do is ask where this problem emerges and you can hand wavy your answer without revealing business IP.
Also, we, as an industry of software engineers, need to re-examine these hard defaults we thought could never be achieved. Such as the .text limits.
Anyway, very good read.

by nicebyte

0 subcomment

shameless plug: if you want to understand the content of this post better, first read the first half of my article on jumps [1] (up to syscall). goes into detail about relocations and position-independent code.
[1] https://gpfault.net/posts/asm-tut-4.html

by gerikson

1 subcomments

The HN de-sensationalize algo for submission titles needs tweaking. Original title is simply "Huge Binaries".