There was a particular game that was superslow when this tech was applied. Original game loading took around 15-20 seconds, whereas once the tech was applied it took easily 3-5 min, even with all data already downloaded.
When I started digging into it, I realized the reason was the game was using something like
fread(data, 1, 65536, fptr);
instead of fread(data, 65536, 1, fptr);
Which basically expanded back in the day to 65k reads of 1 byte for several MB file. Each fread translated to 65k reads of ReadFile Windows API. Since my code was hooking on ReadFile system call, and my call was heavier than ReadFile, the game loading felt really slow. Unusable. It would have not been fun for players.The easy fix was to swap arguments for certain calls. The long fix required to use an internal cache to account for these cases so that the hooked ReadFile was faster when data was already in disk.
Funny thing is that as we started rolling out the tech and applying it to more and more games we realized lots of games did this. We went for the cache fix and games ended up loading faster than before. Honestly, games could have load all the data in a couple of seconds by just swapping the args. I'm guessing developers did this on purpose so that games seemed like they were loading a lot of stuff, although you never know.
I agree it would be stupid for a compiler to even support such a flag, but those were the 1980s/90s.
Actually, the standard way of allocating 64 kB of memory on the stack is to just assume you can do it, subtract 64k from the stack pointer, and hope for the best.
Most stack allocations in the wild are not checked.
That is, until I checked the program I used for testing (which I didn't write), and found the following code:
dealloc(this)
return this->field
With the original allocator, this worked fine, since the deallocation didn't touch the memory.My allocator, however, overwrote the field during the deallocation with bookkeeping stuff, which meant the returned value was not what the programmer intended and after a short while the program crashed.
Unlike TFA, I had the luxury of just fixing the test program.
With more and more code being written with AI (which has notoriously inefficient solutions to simple problems), I expect this issue to become more prevalent. I just hope we optimize at the source of the problem (AI and humans using it) and not on platforms (compiler and engine/kernel heuristics)
It means the fix was applied to run during the emulation loop execution, not that the fix was found and applied while the emulation loop was running.
Which would have made it an emulation code escape.
But there wasn't any similar programmatic debugging aid for detecting uninitialized stack memory.
Going further down the rabbit hole, I discovered the _chkstk function.
The MS C compiler would emit a call to _chkstk on function entry to ensure that stack memory had been paged in. But further reading noted that _chkstk was only emitted if the function allocated a lot of stack memory. And there was source code! MS included the assembly language source code for _chkstk in the CRT source code, installed with compiler.
I needed _chkstk to be emitted for every function not only for functions that allocated >= 4KB of stack variables.
Curses, foiled again.
Then, while perusing the list of compiler command line switches, I see "/Ge".
/Ge (Enable Stack Probes)
Activates stack probes for every function call that requires storage for local variables.
Ahhhhh! The grey, storm clouds parted and the sun rays bathed shone down on me in their warmth.I had all the pieces I needed to fill uninitialized stack memory with a non-zero canary value so I could make detection of uninitialized stack variables more reliable.
_stkfil was born
Modifying _chkstk was easy. I needed to write to every byte of stack in a stack page instead of reading only 4 bytes and skipping to the next page of stack.
While I was mucking in the bowels of modifying _chkstk, I added a 4-byte global variable to hold my canary value. Let the app override what value to use.
In debug builds, _stkfil helped find a couple of bugs, but soon all the stray uninited stack vars were gone and the code was forgotten.
Then I read about InitAll in https://www.microsoft.com/en-us/msrc/blog/2020/05/solving-un...
InitAll - Automatic Initialization
In addition to the previously mentioned approaches, Microsoft is now using a feature known as InitAll which performs automatic compile-time initialization of stack variables.
This section documents how Windows is using this technology and the rationale for why.
Current Windows Settings
The following types are automatically initialized:
- Scalars (arrays, pointers, floats)
- Arrays of pointers
- Structures (plain-old-data structures)
The following are not automatically initialized:
- Volatile variables
- Arrays of anything other than pointers (i.e. array of int, array of structures, etc.)
- Classes that are not plain-old-data
For optimized retail builds, the fill pattern is zero. For floats the fill pattern is 0.0.
For CHK builds or developer builds (i.e. unoptimized retail builds), the fill pattern is 0xE2. For floats the fill pattern is 1.0.solidity sweating profusely