Amdahl’s Law would like to have a word.
It’s not surprising they didn’t see a linear speedup from splitting into so many crates. The compiler now produces a large number of intermediate object files that must be read back and linked into the final binary. On top of that, rustc caches a significant amount of semantic information — lifetimes, trait resolutions, type inference — much of which now has to be recomputed for each crate, including dependencies. That introduces a lot of redundant work.
I also would expect this to hurt runtime performance as it likely reduces inlining opportunities (unless LTO is really good now?)
- in rust one semantic compilation unit is one crate
- in C one semantic compilation unit is one file
There are quite a bunch of benefits in the rust approach, but also drawbacks, like huge projects have to be split into multiple workspaces to maximize parallel building.
Oversimplified the codegen-units setting tells the compiler into how many parts the compiler is allowed to split the a single semantic code gen unit.
Now it still seems strange (as in it looks like a performance bug) that most times rust was stuck in just one threat (instead of e.g. 8).
> That’s right — 1,106 crates! Sounds excessive? Maybe. But in the end this is what makes rustc much more effective.
> What used to take 30–45 minutes now compiles in under 3 minutes.
I wonder if this kind of trick can be implemented in rustc itself in a more automated fashion to benefit more projects.
It will give you a workspace with a bunch of crates that seems to exercise some of the same bottlenecks the blog post described.
But I wonder if generating rust is the best approach. On the plus side, you can take advantage of the rich type and type checking system the compiler has. On the other hand, you're stuck with that compiler.
I wonder if the dynamic constraints can be expressed and checked through some more directly implemented mechanism. It should be both simpler to express exactly the constraints you want (no need to translate to a rust construct that rustc will check as desired), and, of course, should be a lot more efficient. Feldera may have no feasible way to get away from generated rust, but a potential competitor might avoid the issue. (That's not to say the runtime shouldn't/couldn't be implemented in rust. I'm just talking about the large amounts of generated rust.)
That said, I could see how it would make writing the transpiler easier, so that's a win.
I'd aim for this linear speedup for compiling (sans overhead to compile a small crate), but the linking part won't be faster, maybe even slower. Maybe a slightly bigger envelope can tell you how much performance is there to extract and the cost of using "too many" crates (which I'm not even sure it's too many, maybe your original crate was too big to ease incremental compilation?)
Mostly a throwaway code with a heavy input from Claude, so the docs are in the code itself :-)
But in case anyone can find it useful:
I wonder how true this is.
Haven't use feldera but other rust stuff I have if I run as debug it has serious performance problems. However, for testing I have it compile a few crates that do work like `image` to be optimized (and the vast majority as debug) and that is enough to make the performance issues not noticeable. So if the multi-crate hadn't worked, possibly just only compile some of the stuff as optimized.
Edit: grammar
Rust is fast in theory, but if in practice they can't even get their compiler to squeeze any juice from the CPU, then what's the value of that language from a software engineering viewpoint?