I really don't know how LLVM picks between branches or conditional moves, but my guess is that it doesn't assume that float equality is any less likely than other conditions, and some optimization pass in O3 turns unpredictable branches into conditional moves. I base this on the fact that adding std::hint::unlikely to the "equal" branch produces the same assembly for the function in both modes.
https://godbolt.org/z/erGPKaPcx
Whether it's safe to assume in general that float equality is unlikely for the purpose of program optimization, I'll leave to the compiler engineers. If you know the data your program will be handling, adding hints will avoid these surprises.
It ended up being > 2x faster in debug build, but 2x-5x slower in the release build (??!!?) [1]. I haven't learned much about compilers/"lower level" C++, so I moved on at that point.
How it worked:
1.) The P.Q. created a vector and resized it to known bounds.
2.) The P.Q. kept tract of and updated the "active sorting range" each time an element was inserted or popped.
2B.) So each time an element is added, it uses the closest unused vector element and updates the ".end" range to sort
2C.) Each time an element was removed, it updated the ".start" range
3.) In theory this should have saved reallocation overhead.
[1] I believe Visual Studio uses -O0 for debug, and -O2 for release.
I wonder whether profile-guided optimization would have led LLVM to select a better (or worse) decision.
The second one is your problem. Haswell is 15 years old now. Almost nobody owns a CPU that old. -O3 makes a lot of architecture-dependent decisions, and tying yourself to an antique architecture gives you very bad results.
Good post though.