FRESH

Hacker News

Home

A production bug that made me care about undefined behavior

160 points by birdculture

by nneonneo

9 subcomments

Even calling uninitialized data “garbage” is misleading. You might expect that the compiler would just leave out some initialization code and compile the remaining code in the expected way, causing the values to be “whatever was in memory previously”. But no - the compiler can (and absolutely will) optimize by assuming the values are whatever would be most convenient for optimization reasons, even if it would be vanishingly unlikely or even impossible.
As an example, consider this code (godbolt: https://godbolt.org/z/TrMrYTKG9):
```
    struct foo {
        unsigned char a, b;
    };

    foo make(int x) {
        foo result;
        if (x) {
            result.a = 13;
        } else {
            result.b = 37;
        }
        return result;
    }
```
At high enough optimization levels, the function compiles to “mov eax, 9485; ret”, which sets both a=13 and b=37 without testing the condition at all - as if both branches of the test were executed. This is perfectly reasonable because the lack of initialization means the values could already have been set that way (even if unlikely), so the compiler just goes ahead and sets them that way. It’s faster!

by panstromek

1 subcomments

I have bumped into this myself, too. It's really annoying. The biggest footgun isn't even discussed explicitly and it might be how the error got introduced - it's when the struct goes from POD to non-POD or vice-versa, the rules change, so completely innocent change, like adding a string field, can suddenly create undefined behaviour in unrelated code that was correct previously.

by fizzynut

1 subcomments

Even if you fixed the initialized data problem, this code is still a bug waiting to happen. It should be a single bool in the struct to handle the state for the function as there are only two states that actually make sense.
succeeded = true; error = true; //This makes no sense
succeeded = false; error = false; //This makes no sense
Otherwise if I'm checking a response, I am generally going to check just "succeeded" or "error" and miss one of the two above states that "shouldn't happen", or if I check both it's both a lot of awkward extra code and I'm left with trying to output an error for a state that again makes no sense.

by MutableLambda

0 subcomment

Yeah, looks pretty straightforward to me, but I used to write C++ for a living. I mean, there are complicated cases in C++ starting with C++11, this one is not really one of them. Just init the fields to false. Most of these cases is just C++ trying to bring in new features without breaking legacy code, it has become pretty difficult to keep up with it all.

by pornel

0 subcomment

To me the real horror is that the exact same syntax can be either a perfectly normal thing to do, or a horrible mistake that gives the compiler a license to kill, and this doesn't depend on something locally explicit, but on details of a definition that lives somewhere else and may have multiple layers of indirection.

by mac3n

0 subcomment

Many years had a customer complaint about undefined data changing value in Fortran 77. It turned out that the compiler never allocated storage for uninitialized variables, so it was aliased to something else.
Compiler was changed to allocate storage for any referenced varibles.

by vhantz

1 subcomments

The two fields in the struct are expected to be false unless changed, then initialize them as such. Nothing is gained by leaving it to the compiler, and a lot is lost.

by letmetweakit

0 subcomment

I once reported several UB bugs to a HackerOne-led cryptocurrency bounty program. They were rejected because the software was working as intended and that they would "inspect the assembly every time they compiled". Yeah right.

by canucker2016

0 subcomment

But there's nothing in your code that suggests that there's a problem if the error and success fields are both true.
Typically you'd have at least an assert (and hopefully some unit tests) to ensure that invariant (.success ^ .error == true).
But the code has just been getting by on the good graces of the previous stack contents. One random day, the app behaviour changed and left a non-zero byte that the response struct picked up and left the app in the alternate reality where .success == .error
Others have mentioned sanitizers that may expose the problem.
Microsoft's Visual C++ compiler has the RTCs/RTC1 compiler switch which fills the stack frame with a non-zero value (0xCC). Using that compiler switch would have exposed the problem.
You could also create a custom __chkstk stack probe function and have GCC/Clang use this to fill the stack as well as probing the stack. I did this years ago when there was no RTCs/RTC1 compiler option available in VC++.

by yongjik

2 subcomments

I think UB doesn't have much to do with this bug after all.
The original code defined a struct with two bools that were not initialized. Therefore, when you instantiate one, the initial values of the two bools could be anything. In particular, they could be both true.
This is a bit like defining a local int and getting surprised that its initial value is not always zero. (Even if the compiler did nothing funny with UB, its initial value could be anything.)

by AdieuToLogic

1 subcomments

There are a few problems with this post:

  1 - In C++, a struct is no different than a class
      other than a default scope of public instead of
      private.
  2 - The use of braces for property initialization
      in a constructor is malformed C++.
  3 - C++ is not C, as the author eventually concedes:

  At this point, my C developer spider senses are tingling: 
  is Response response; the culprit? It has to be, right? In 
  C, that's clear undefined behavior to read fields from 
  response: The C struct is not initialized.

In short, if the author employed C++ instead of trying to use C techniques, all they would have needed is a zero cost constructor definition such as:

  inline Response () : error (false), succeeded (false)
  {
    ;
  }

by inglor_cz

1 subcomments

Symbian's way of avoiding this was to use a class called CBase to derive from. CBase would memset the entire allocated memory for the object to binary zeros, thus zeroizing any member variable.
And by convention, all classes derived from CBase would start their name with C, so something like CHash or CRectangle.

by kayo_20211030

0 subcomment

Great post. It was both funny and humble. Of course, it probably wasn't at all funny at the time.

by Panzerschrek

0 subcomment

That's why I always specify default initializers for fields of fundamental types and other types which don't have default constructor.

by titzer

4 subcomments

tldr; the UB was reading uninitialized data in a struct. The C++ rules for when default initialization occurs are crazy complex.
I think a sanitizer probably would have caught this, but IMHO this is the language's fault.
Hopefully future versions of C++ will mandate default initialization for all cases that are UB today and we can be free of this class of bug.

by nurumaik

0 subcomment

Example from this article looks more like "unspecified" behavior rather than "undefined". Title made me expect nasal demons, now I'm a bit disappointed

by quuxplusone

0 subcomment

I mean, "obviously" if you don't initialize your variables, they'll contain garbage. You can't assume that garbage is zero/false, or any other meaningful value.
But re the distinction at the end of TFA — that a garbage char is slightly more OK than a garbage bool — that's also intuitive. Eight bits of garbage is always going to be at least some valid char (physically speaking), whereas it's highly unlikely that eight bits of garbage will happen to form a valid bool (there being only two valid values for bool out of those 256 possible octets).
This also relates to the (old in GCC but super new in Clang, IIUC) compiler option -fstrict-bool.