FRESH

Hacker News

Home

Cutting down Rust compile times from 30 to 2 minutes with one thousand crates

143 points by Qadriq

by wiseowise

4 subcomments

> Given that we now fully utilize 128 threads or 64 cores for pretty much the entire compile time, we can do a back of the envelope calculation for how long it should take: 25 min / 128 = 12 sec (or maybe 24 sec since hyper-threads aren't real cores). Yet it takes 170s to compile everything.
Amdahl’s Law would like to have a word.

by HippoBaro

1 subcomments

Eminently pragmatic solution — I like it. In Rust, a crate is a compilation unit, and the compiler has limited parallelism opportunities, especially since rustc offloads much of the work to LLVM, which is largely single-threaded.
It’s not surprising they didn’t see a linear speedup from splitting into so many crates. The compiler now produces a large number of intermediate object files that must be read back and linked into the final binary. On top of that, rustc caches a significant amount of semantic information — lifetimes, trait resolutions, type inference — much of which now has to be recomputed for each crate, including dependencies. That introduces a lot of redundant work.
I also would expect this to hurt runtime performance as it likely reduces inlining opportunities (unless LTO is really good now?)

by dathinab

3 subcomments

The main issue here is:
- in rust one semantic compilation unit is one crate
- in C one semantic compilation unit is one file
There are quite a bunch of benefits in the rust approach, but also drawbacks, like huge projects have to be split into multiple workspaces to maximize parallel building.
Oversimplified the codegen-units setting tells the compiler into how many parts the compiler is allowed to split the a single semantic code gen unit.
Now it still seems strange (as in it looks like a performance bug) that most times rust was stuck in just one threat (instead of e.g. 8).

by hu3

2 subcomments

> We're using rustc v1.83, and despite having a 64-core machine with 128 threads, Rust barely puts any of them to work.
> That’s right — 1,106 crates! Sounds excessive? Maybe. But in the end this is what makes rustc much more effective.
> What used to take 30–45 minutes now compiles in under 3 minutes.
I wonder if this kind of trick can be implemented in rustc itself in a more automated fashion to benefit more projects.

by lsuresh

0 subcomment

For any Rust compiler experts interested in taking a look, I've put together a short repro here: https://github.com/feldera/feldera/issues/3882
It will give you a workspace with a bunch of crates that seems to exercise some of the same bottlenecks the blog post described.

by jmull

2 subcomments

That's a cool project.
But I wonder if generating rust is the best approach. On the plus side, you can take advantage of the rich type and type checking system the compiler has. On the other hand, you're stuck with that compiler.
I wonder if the dynamic constraints can be expressed and checked through some more directly implemented mechanism. It should be both simpler to express exactly the constraints you want (no need to translate to a rust construct that rustc will check as desired), and, of course, should be a lot more efficient. Feldera may have no feasible way to get away from generated rust, but a potential competitor might avoid the issue. (That's not to say the runtime shouldn't/couldn't be implemented in rust. I'm just talking about the large amounts of generated rust.)

by Crisco

1 subcomments

Are there any performance implications for the final binary because you’re splitting it up into thousands of crates?

by nixpulvis

0 subcomment

I love Rust, but it seems like a really bad intermediate language if you have a compiler/transpiler which is sound. You don't need the type system from Rust telling you something's wrong.
That said, I could see how it would make writing the transpiler easier, so that's a win.

by dietr1ch

2 subcomments

> back of the envelope calculation for how long it should take: 25 min / 128 = 12 sec (or maybe 24 sec since hyper-threads aren't real cores). Yet it takes 170s to compile everything.
I'd aim for this linear speedup for compiling (sans overhead to compile a small crate), but the linking part won't be faster, maybe even slower. Maybe a slightly bigger envelope can tell you how much performance is there to extract and the cost of using "too many" crates (which I'm not even sure it's too many, maybe your original crate was too big to ease incremental compilation?)

by lesuorac

0 subcomment

> Of course, we tried debug builds too. Those cut the time down to ~5 minutes — but they’re not usable in practice.
I wonder how true this is.
Haven't use feldera but other rust stuff I have if I run as debug it has serious performance problems. However, for testing I have it compile a few crates that do work like `image` to be optimized (and the vast majority as debug) and that is enough to make the performance issues not noticeable. So if the multi-crate hadn't worked, possibly just only compile some of the stuff as optimized.

by ay

1 subcomments

I have just went through this with a project of mine, though unfortunately the code wasn’t autogenerated, so I needed to do a lot of mind-numbingly boring search-and-replace commands. I have cobbled together a little utility that allowed to automate the process somewhat.
Mostly a throwaway code with a heavy input from Claude, so the docs are in the code itself :-)
But in case anyone can find it useful:
https://github.com/ayourtch/tweak-code

by 1vuio0pswjnm7

0 subcomment

The dependency costs seem to be self-evident but perhaps it would be enlightening to see a comparison of the energy costs, as well as the time costs and storage costs, of compiling Rust programs versus their C equivalents. For example, comparison might reveal that there is no difference and no trade-off or that any differences and trade-offs are small enough to be worth making in the interest of some higher purpose.

by skeptrune

0 subcomment

Rust documentation needs some work to emphasize that splitting the codebase into many smaller crates is the "correct" way to do things if you care about build time.

by kzrdude

0 subcomment

I wonder if it would make a difference, at the starting point, to use fully optimized compile for dependencies but only opt-level=1 for the main crate?

by protolyticmind

0 subcomment

One thing I am curious about is why the need for crates? Did they try modules or was the initial compiler using them??
Edit: grammar

by xrisk

1 subcomments

Not sure about the nature of the generated code, but wouldn’t doing the equivalent of dynamic linking mitigate this problem?

by sitzkrieg

1 subcomments

the rust compiler is so impressively slow

by amelius

4 subcomments

> despite having a 64-core machine with 128 threads, Rust barely puts any of them to work.
Rust is fast in theory, but if in practice they can't even get their compiler to squeeze any juice from the CPU, then what's the value of that language from a software engineering viewpoint?