However unsigned integers are still very useful, I'd say essential, in low-level programming. For example when doing buffer management and memory allocation.
- bitwise operations
- modular arithmetic implemented with just ++, -- (ringbuffers, e.g TCP sequence numbers)
- using the full range of a 8-bit, 16-bit, 32-bit datatype (quite common)
- splitting a positive quantity into two smaller quantities, e.g. using a 16-bit index as 8-bit major index plus 8-bit minor index.
etc.Don't forget that the signed vs unsigned integer is in some sense an artificial distinction. Machines have you put the distinction in the CPU instructions themselves, they don't track a "signed" property as part of values. And it can make sense to use the same value in different ways. However, C and many other languages decided to put a tag on the type, so operator syntax can be agnostic to signedness, and the compiler will choose the appropriate CPU instruction.
I'm not sure Go is saner just because len is an int. Well, maybe, depending on how you look at it. Defining len to be signed int, means the largest valid len is half your address space, which also means half of all possible indexes are always invalid; which makes some things easier.
But it's really that integer arithmetic is not undefined behavior regardless of signedness, that bounds are checked, and that even indexing your slice with an int64 on a 32-bit CPU does the full correct bounds check. In fact, you can use any integer type as an index.
Given all of the above, indexing with a uint or an int is actually indiferent. In that case, the bound check is a single unsigned <len compare (despite the fact that len is signed).
What's really painful, is trying to handle a full 32-bit address space with 32-bit addresses and sizes, like in Wasm; you need 33-bit math. So in a sense, limiting sizes to 31-bit (signed) does help. But at the language level, IMO, the rest matters more.
TBH I've had very little struggle with this at all. As long as you keep your values and types separate, the unsigned type that you got a number from originally feeds just fine into the unsigned type that you send it to next. Needing casting then becomes a very clear sign that you're mixing sources and there be dragons, back up and fix the types or stop using the wrong variable. It's a low-cost early bug detector.
Implicitly casting between integer types though... yeah, that's an absolute freaking nightmare.
My two ruls of thumb for C code are:
1. use signed integers for everything except bit-wise operations and modulo math (e.g. "almost always signed")
2. make implicit sign conversion an error via `-Werror -Wsign-conversion`
The problem with making sizes and indices unsigned (even if they can't be negative) is that you'd might to want to add negative offsets, and that either requires explicit casting in languages without explicit signed/unsigned conversion (e.g. additional hassle and reducing readability), or is a footgun area in languages with implicit sign conversion.
Alas, (2S * signed_index) / 2S will similarly result in surprises the moment the signed_index hits half the signed-int max. There's no free lunch when trying to cheat the integer ranges.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...
* Do your utmost to rewrite the code in order to avoid doing that (e.g. reordering disequations to transform subtractions into additions). * If not possible, think very hard about any possible edge case: you most certainly need an additional `if` to deal with those. * When analyzing other people's code during troubleshooting merge reviews, assume any formula involving an unsigned integer and a minus sign is wrong.
Well … we even mention Rust in the paragraph right before this. In Rust, you can up a u16 to a u64 this way:
let bigger: u32 = x.into();
or let bigger = u32::from(x);
The conversion `from` is infallible, because a u16 always fits in a u32. There is no `from(u64) -> u32`, because as the article notes, that would truncate, so if we did change the type to u64, the code would now fail to compile. (And we'd be forced to figure out what we want to do here.)(There are fallible conversions, too, in the form of try_from, that can do u64 → u32, but will return an error if the conversion fails.)
Similarly, for,
for (uint x = 10; x >= 0; x--) // Infinte loop!
This is why I think implicit wrapping is a bad idea in language design. Even Rust went down the wrong path (in my mind) there, and I think has worked back towards something safer in recent years. But Rust provides a decent example here too; this is pseudo-code: for (uint x = 10; x.is_some(); x = x.checked_sub(1))
Where `checked_sub` is returns `None` instead of wrapping, providing us a means to detect the stopping point. So, something like that. (Though you'd probably also want to destructure the option into the uint for use inside the loop.) Of course, higher-level stuff always wins out here, I think, and in Rust you wouldn't write the above; instead something like, for x in (0..=10).rev()
(And even then, if we need indexes; usually, one would prefer to iterate through a slice or something like that. The higher-level concept of iterators usually dispenses with most or all uses of indexes, and in the rare cases when needed, most languages provide something like `enumerate` to get them from the iterator.)Given your examples, I think you'd have fewer issues if you were working with unsigned integers exclusively. Although I'm curious about what other code you were referencing with this: "But seeing how each change both made the code easier to reason about and more correct, I couldn’t deny the evidence."
With regards to modulo, in Zig if you try to use it with a signed integer it will tell you to specify whether you want `@mod` or `@rem` semantics. In my case, I'd almost never write `x % 2`, I'd write `x & 1`. I do use unsigned division but I'd pretty much never write code that would emit the `div` instruction.
I'm not saying you're wrong though! Everyone has a different mind. If you attain higher correctness and understandability through using signed integers, that's great. I'm just saying I'm in the opposite camp.
The potential bugs listed would be prevented by, e.g. "x--" won't compile without explicitly supplying a case for x==0 OR by using some more verbose methods like "decrement_with_wrap".
The trade-off is lack of C-like concise code, but more safe and explicit.
I don’t really get this claim. Indexing should just look up the element corresponding to the value provided. It’s easy to come up with semantics that are intuitive and sound, even if signed integers or ones smaller than size_t are used.
Sizes and indices of course need to be unsigned, and any self respecting compiler should warn about dangerous usage.
Sure, it's possible to write bugs in C. And if you really want to, you can disable the compiler warnings which flag tautologous comparisons and mixed-sign comparisons (a common reason for doing this is to avoid spurious warnings in generic-type code).
But, uhh, "people can deliberately write bugs" has got to be the weakest justification I've ever seen for changing a language feature -- especially one as fundamental as "sizes of objects can't be negative".
Fix the language. Don't hack around it by using the wrong type.