I think this post has some good information in it, but this is essentially overstated: I look at crate discrepancies pretty often as part of reviewing dependency updates, and >90% of the time it's a single line difference (like a timestamp, hash, or some other shudder between the state of the tree at tag-time and the state at release-time). These are non-ideal from a consistency perspective, but they aren't cause for this degree of alarm -- we do know what the code does, because the discrepancies are often trivial.
> Go got it right from the beginning and didn't use a centralized package registry to manage dependencies, but instead you have to directly point to the source code of the packages.
Directly coupling the identity of a package to its location means that you can't change one without the other. Need to rename to a fork of a dependency? You'll have to touch every single file that imports it. Need to use a organisational local cache for deps? It better be a transparent proxy or you can't do that. The only support for this is replace statements in go.mod files, but those tie you even further in knots when you need to pull in a dependency that has a replace statement in it.
It's even worse on the maintainer side. If you want to rename a git repository, rename a GitHub organisation, migrate a repo to a different owner, or even move git hosting providers then you risk breaking every single downstream. The only solution is to host a proxy for your packages on a custom URL that redirects to the backend hosting provider, and set your Go module's name to be on that custom URL. Which requires you to do this ahead of time before the package is widely used.
Cargo and crates.io could be better but Go is the worst place to draw inspiration from as it's full of ideas that seem clean, work at first, and break in hard to fix ways once you do anything complex.
Go programs, and python programs (which also has a pretty comprehensive standard library) have a lot of dependencies too. A big standard library helps a little, but I'm doubtful it is the "single best defense".
And there are several practical problems with a big standard library, which this article didn't address at all. I think for rust at least, a much better approach would be to have a collection of "blessed" libraries under the umbrella of the Rust Foundation. But that just reduces the risk for a subset of dependencies it doesn't solve the fundamental risks.
https://vincents.dev/blog/rust-dependencies-scare-me/
It sparked some interesting discussion by lots of the rust maintainers
https://news.ycombinator.com/item?id=43935067
A fat std lib will definitely not solve the problem. I am a proponent of the rust foundation taking packages under their wing and having them audited and funded while keeping original maintainers in tact
Huh, how is this possible? Is the code not pulled from the repository? Why not?
It's basically what we're already doing in our OSes (mobile at least), but now it should happen on the level of submodules.
I dug into the linked article, and I would really say this means something closer to 17% of the most popular Rust package versions are either unbuildable or have some weird quirks that make building them not work the way you expect, and not in a remotely reproducible fashion.
https://lawngno.me/blog/2024/06/10/divine-provenance.html
Pulling things into the standard lib is fine if you think everyone should stop using packages entirely, but that doesn't seem like it really does anything to solve the actual problem. There are a number of things it seems like we might be forced to adopt across the board very soon, and for Rust it seems tractable, but I shudder to think about doing it for messier languages like Ruby, Python, Perl, etc.
* Reproducible builds seems like the first thing.
* This means you can't pull in git submodules or anything from the Internet during your build.
* Specifically for the issues in this post, we're going to need proactive security scanners. One thing I could imagine is if a company funnels all their packages through a proxy, you could have a service that goes and attempts to rebuild the package from source, and flags differences. This requires the builds to be remotely reproducible.
* Maybe the latest LLMs like Claude Mythos are smart enough that you don't need reproducible builds, and you can ask some LLM agent workflow to review the discrepancies between the repo and the actual package version.
crates are somewhat better designed than NPM/PyPI (the dist artifacts are source based), but still much worse than Go where there's an intermediate packaging step disconnected from the source of truth.
Vendoring doesn't entirely solve the problem with hidden malicious code as described in the article, but it gives your static analyzers (and agents) full context out of the box. Also better audit trail when diagnosing the issue.
Vendor your dependencies. Download the source and serve it via your own repository (ex. [1]). For dependencies that you feel should be part of the "Standard Library" (i.e. crates developed by the Rust team but not included into std) don't bother to audit them. For the other sources, read the code and decide if it's safe.
I'm honestly starting to regret not starting a company like 7 years ago where all I do is read OSS code and host libraries I've audited (for a fee to the end-user of course). This was more relevant for USG type work where using code sourced from an American is materially different than code sourced from non-American.
Most people get rust from rustup, an unsigned "curl | sh" style magic script.
Whoever controls the dns for rustup.rs, or the webserver, or the BGP nodes between you and the webserver, can just change that at any time, or change it only when requests come from specific IP addresses and backdoor people all day long.
Next you end up getting binaries of rust, that are not reproducible, and have no remote attestation provenance for the build system. Without at least one of those that means there is a CI, or workstation somewhere one or more people control building those binaries that could also tamper with them silently at build time. No one can reproduce them so until someone has the time to do some deep dive binary diffing, unlikely to be detected any time soon.
And then, we get to the fact the official rust releases are not full source bootstrapped. To build rust 1.94 you need rust 1.93 and so on. That means if you have ever backdoored -any- release of rust in the past using the above weaknesses, you have backdoored all of them via trusting trust attacks where the backdoor always detects when the next version of the rust compiler is being built and copies itself over to the new build.
The way you prove this is not happening within the rust build chain, is you bootstrap from another compiler. The rust team does not do this but thankfully Mutabah made mrustc, which is a minimal c++ port of the rust compiler suitable for building the actual rust compiler, so we can anchor our supply chain to a C compiler instead.
But now how do you trust the C compiler? Some random compiler from debian is at least signed, but only signed by one person. Another major risk.
So now you need to build your c compiler from source code all the way up, a technique called full source bootstrapping. A tiny bit of human reviewable machine code is used to build a more complex version of itself, all the way up to tinycc, gcc, llvm, and eventually rust. And then you have it all be deterministic and then have many people build any portions of the build chain that have changed every release and all get the same result, and sign that result. THEN we know you are getting a faithful build of the rust compiler from source that no one had the opportunity to tamper with.
That is how we build and release rust in stagex: https://stagex.tools/packages/core/rust/
Credit where due that Guix did this first, though they still have a much more relaxed supply chain security policy so threat model accordingly.
But how do you know the actual source of the stagex build process was not tampered by an impersonated maintainer that merged their own malicious PR made by a pseudonym bypassing code review? Well we sign every commit, and we sign every PR. Every change must have at least two cryptographic signatures by well known WoT established private keys held on smartcards by maintainers. We simply do not allow merging PRs from randos until another maintainer has signed the PR and then a -different- maintainer can review and do a signed merge. This also means no one can re-write git history, so we can survive a compromise even of the git server itself.
This is something only stagex does as far as we can tell, as our threat model assumes at least one maintainer is compromised at all times.
But, aside from a few large high risk entities, most people are not using stagex or guix built rust and just yolo using a shell script to grab a random binary and start compiling code with it.
I would strongly urge people to stop doing that if you are working on software meant to run on anything more security sensitive than a game console.
With the giant wave of AI bots doing account takeovers and impersonation all the time, github login as the last line of defense is going to keep ending badly.
Use provably correct binaries no single person can tamper with, or build them yourself.