- > There are a shocking number of ways to accidentally create nondeterministic output when doing C/C++ development. One of the easiest is to use the builtin __DATE__ and __TIME__ macros to stamp a build with the time the compiler was executed at:
Am I missing something here? Yes, if you use a feature that intentionally inserts the build time and date into the code, the every build is going to be different. That's the whole point of these macros. It's a feature. If you don't want that behavior, don't use that feature.
by pertymcpert
1 subcomments
- If Clang generated non-deterministic output due to pointer addresses then that's a bug (happens regularly) that should be fixed. The most common way this happens if it some code path is iterating over a DenseMap which is non-deterministic. Sometimes that's fine and sometimes that's not depending on how that map is used. The common way to fix that is to switch to a MapVector which pays some additional runtime/memory cost to guarantee deterministic iteration order.
- Reading this, I think low level engineering is actually more dependent on specific environments. Hardware also has its own points of change. Usually, when you think at a high level, environmental changes are less significant than you might expect. But low level thinking tends to be tied to specific environments, which is what makes it difficult. The reason low level is hard is that even if the code itself is short, the hidden assumptions inside it are difficult and place a heavy cognitive load on the programmer. For example, even a short snippet in C like
`int value = (int)buffer`
requires a lot of implicit knowledge about the 4 byte alignment of the buffer, or whether int is exactly 32 bits. LLMs do not seem to be very good at knowing these things. Rather, they are strong at high level wrapping, but at the low level, they seem surprisingly difficult and somewhat useless. Hardware has CPU generation changes, and in the case of PLCs, where I mainly work, the protocol differences between vendors are far too severe. There does not seem to be any technology with a very long lifecycle.
- Better title: Reproducible builds are hard
- So to avoid those energy-hungry LLM companies from scraping your website, you force each browser to compute a lot of hashes in a necessarily energy-hungry loop, creating, at the same time, all the kind of accessibility problems?
- Could have sworn the author was a nix(os) user already. I know it’s a meme but what all the problems they’re describing literally is solved by nix. The nix sandbox even catches calls for time for example to replace it with 0 for determinism.
by ComputerGuru
1 subcomments
- These seem very reasonable, the workarounds used are natural, and overall the article is not at all congruous with the conclusion in the (clickbait?) title?
Compilers literally made your project possible!
- Nix also needs the build output to be deterministic to calculate the hash. It also has the problems of timestamps etc. The build environment tries to be hermetic by setting the time to be epoch among other things.
- Time date env variables and random address... Is also input data, maybe not as a flag but still
by KolmogorovComp
2 subcomments
- I’m still surprised by Anubis’ decision not to make the PoW have a useful output, for example a crypto, protein-folding like, or something else.
And I speak as being generally very critical of cryptos, but here rewarding the website owner with some cents to have access seems fair, and resolves the traditional issues about micro-payments.
by swiftcoder
1 subcomments
- The Birth and Death of Javascript really had the gift of prophecy, eh
- GetProgramPath consulting PATH is a pretty standard behavior inspired by GCC. Clang driver may find an external program from its program paths (e.g. -ccc-install-dir, the GCC installation's bin directory, -B, ) and then PATH.
> Clang relies on address layout for ordering things
Every such instance is a bug. I have fixed many issues in 2023. There is even an upstream build bot https://discourse.llvm.org/t/reverse-iteration-bots/72224
- > What do you do when the client has WebAssembly disabled?
> I decided to take inspiration from the legendary talk The Birth and Death of JavaScript and just recompile the WebAssembly to JavaScript.
So what do you do when the client has Javascript disabled ?
- Everything is within spec, reproducible builds are not a goal of C/C++.
The compiler builders may take pity on you, but really there's no bug here, just unwarranted expectations.
- > What do you do when the client has WebAssembly disabled?
Do people really do that? -- disable, not just using old browsers with no wasm.
Disabling wasm while keeping js enabled is a configuration i can't understand
- I hate proof of work code running on my machine for the benefit of someone else. It's like planting a crypto miner.
by randusername
0 subcomment
- I've seen posts by this author before and did not understand if the commentary characters were referential or a creation of the author. Turns out its the latter. I dismissed the underlined names as just styling, not hyperlinks.
https://xeiaso.net/characters/
- To avoid all those grotesque and absurd compilers and runtimes, more for those of computer languages with a ultra-complex syntax (c++ and similar), I now design "binary specifications" which I "design" and "validate" with RISC-V assembly coding.
Here, since any whatwg cartel web engine is an issue, the author should not bother.
- A better solution might be to use https://github.com/evanw/polywasm to run the original wasm in place.
by childintime
0 subcomment
- I hate compilers too, just because in theory they are the simplest type of program: transform one file into another. How is not be reproducible? You need to make an effort! Yet the author of this post has trouble getting a reproducible build. To me it sums up the utter insanity our industry is engaged in.
by ekjhgkejhgk
1 subcomments
- [flagged]
by mathisfun123
3 subcomments
- [flagged]
by 3dedb728-3f77
2 subcomments
- [flagged]
by charcircuit
1 subcomments
- As long as the program is equivalent there isn't an actual problem here. Requiring the output to always be the same is an arbitrary restriction.
If you want to have users trust that someone else hasn't modified it, then sign it with your identity.
by dyauspitr
7 subcomments
- LLMs should be trained on and directly output binary.