> Be extremely portable
> sp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan. It works with the big compilers and it works with TCC.
> And, best of all, it does all all of that because it’s small, not because it’s big.
vs
> Non-goals
> Obscure architectures and OSes
> I write code for x86_64 and aarch64. WASM is becoming more important, but is still secondary to native targets. I don’t care to bloat the library to support a tiny fraction of use cases.
> That being said, if you’re interested in using the library on an unsupported platform, I’m more than happy to help, and if we can make the patch reasonable, to merge it.
Those are contradictory. Either the code is extremely portable, or it can't support "obscure" platforms, but not both.
Zig, one of the giants upon whose shoulders this library stands, coined a name for this
almost-but-not-quite-UTF encoding: WTF-8 and WTF-16. These encodings mean, simply, the
same as their UTF counterpart but allowing unpaired surrogates to pass through.
To give credit where credit is due, both WTF-8 and WTF-16 were devised by Simon Sapin [1] and Zig simply picked them up.[1] https://wtf-8.codeberg.page/
sizeof((T){0} = $value)
Wait, is a compound literal an l-value in that sense (as opposed to, just being able to take its reference)?! Take a look at the C99 standard Oh my, it indeed is (C99 §6.5.2.5 p5). Good to know!I had a hard time reading the wc code in the article. First I had to go to the GitHub to understand that "da" stands for dynamic array, and then understand that what the author calls wc is not at all the wc linux commands, which by default gives you the number of lines, words, and characters in a file, not the count of occurrences of each word in the file, which is what the proposed code does.
Also, since I had to read the GitHub README, another remark: it says that sp_io uses pthreads rather than fork and exec. Both of those approach (but especially pthreads) are contradictory to the explicit goals of programming against lowest level interfaces. I believe the lowest level syscall is clone3 [1], which gives you more fine grained control on what is shared between the parent and child processes, allowing to implement fork or threads.
[1] https://manpages.debian.org/trixie/manpages-dev/clone3.2.en....
Pointer/length is not just for strings - but for all arrays.
See my proposal:
I agree that pointer and length is better than null-terminated strings (although it is difficult in C, and as they mention you will have to use a macro (or some additional functions) to work this in C).
Making the C standard library directly against syscalls is also a good idea, although in some cases you might have an implementation that needs to not do this for some reason, generally it is better for the standard library directly against syscalls.
FILE object is sometimes useful especially if you have functions such as fopencookie and open_memstream; but it might be useful (although probably not with C) to be able to optimize parts of a program that only use a single implementation of the FILE interface (or a subset of its functions, e.g. that does not use seeking).
Works nicely on Linux where the syscall interface is explicitly stable, but on many (most?) other platforms this is not the case.
> There Is No Heap
I don't understand what this means, when it's followed by the definition of a heap allocation interface. The paragraph after the code block conveys no useful information.
> Null-terminated strings are the devil’s work
Agreed! I also find the stance regarding perf optimization agreeable.
If the only problem with C was that the stdlib is terrible that would be a very different situation.
There are much more fundamental problems with the language. Problems that are entirely understandable in K&R C but aren't acceptable half a century later. A "high quality" standard library can't fix these problems. In some cases it can paper over them though not others, and even then the actual problem wasn't fixed it's just not obvious with superficial examination any more.
First, the type system is crap. The array types don't work across function boundaries, there's no Empty type at all, you are provided with a user defined product type with names, but not one without names etc. There is no fat pointer type, slice reference, nothing like that.
Second, naming is also crap. There's no namespacing feature provided so you're left with the convention of picking a few letters as a prefix and hoping it doesn't overlap and yet is succinct enough to not be annoying.
Third, everything coerces, all the coercions you could want if you like coercions, and then ten times that many on top. Some people really like coercions, C will see them learn that actually they don't like them that much.
Probably I would have made different choices. For example, I'd rather have many modules that can be individually included, than one giant file.
Also from a purely aesthetic point of view, I would have opted for more readable function and type names: no sp_ prefix, recognizable names like dict istead of ht, vec instead of da, etc.
And I know there are compilers out there still stuck in the 90s, but I would have targeted C23, these days.
But that would be my highly opinionated library!
P.S. be aware that word frequency is not what the standard 'wc' does.
This is funny to me because just today some friends gave me a link to a C quiz:
From which I gathered that it is a much more cursed language than I remembered. Maybe we all just got used to C and just happen to use a minimal subset.
The problems with C are not mainly with the standard library, but any effort to improve things should be lauded.
First, (on unix) it's wrapping pthread mutex. That's part of libc! (Technically it might not be libc.so, but it's still the standard library.)
Also, none of the atomics talk about the memory model. You don't _have_ to use the C11 memory model (Linux, for example, doesn't). But if you're not using the C11 memory model and letting the compiler insert fences for you, you definitely need to have fence instructions, yourself.
While C11 atomics do rely on libgcc, so do the __sync* functions that this library uses (see https://godbolt.org/z/bW1f7xGas) for an example.
Oops... apparently this is vibecoded. Welp, I just wasted ten minutes of my life reviewing slop that I'm not going to get back.
i doubt many people will go through the whole code themselves or do this stuff to determine if claims are true and/or its worth to port something to this or start learning it.
people spend a lot of time getting familiar with libraries in order to be able to use them properly.. If you help them a long that path more it will be more inviting to try your code.
(that being said i think people talking about fixing C likely dont realise that people just quietly roll their own libs like this..not to fix C but because that is what C programming is.)
"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."
The GNU Coding Standard in 1994, http://web.mit.edu/gnu/doc/html/standards_7.html#SEC12
c8 buf [SP_PATH_MAX] = sp_zero;
sp_cstr_copy_to_n(path, len, buf, SP_PATH_MAX);
since #define SP_PATH_MAX 4096
There should be a fallback for very long paths.How does it deal with code executing before main? Libc does a bunch of necessary stuff, like calling initializers for global variables.
C is horribly and unfixably broken. We've known that for many decades. Just let it die already!
Let's move on.
The functions strncpy, snprintf, strncat, are fountains of bugs.
Please, describe..
P.S. sad to see that HN becomes a witch hunting place
Designing software and data structures for performance against unknown use cases on unknown hardware is extremely difficult and the resulting code is much more complicated. Even then, it’s often better to use code written against your actual use case and hardware when performance is that critical.
Things that are off the table might be:
SIMD A highly optimized hash table rewrite Figuring out where inlining or LIKELY causes the compiler to produce better code."
LOL...
Classic vibe coder.
> Only couple of languages not affected are those that don't have a culture of downloading third party code, like C and C++
> Ex js and python developer publishes a 'library'
> Library is vibe coded
> Published on github amidst GitHub being hit by supply chain attacks, had their source code leaked.
The timing is terrible for starters, and I don't trust the vibe coded code at all. Imagine a pandemic and the cities are on fire, and you arrive to a rural town asking to kiss people.
Yet another slop coded library.
What could possibly go wrong...
Why do standard library headers always have to be insane?