FRESH

Hacker News

When O3 is 2x slower than O2

100 points by keyle

by spacecadet_

1 subcomments

In the branchy function, id is only compared if distance is equal, and since distance is a random float, this almost never happens and the corresponding branch is nearly perfectly predicted. The branchless function always compares both id and distance, effectively doing twice the work. It's only part of the reason why there's a 2x performance difference, but I thought it was interesting.
I really don't know how LLVM picks between branches or conditional moves, but my guess is that it doesn't assume that float equality is any less likely than other conditions, and some optimization pass in O3 turns unpredictable branches into conditional moves. I base this on the fact that adding std::hint::unlikely to the "equal" branch produces the same assembly for the function in both modes.
https://godbolt.org/z/erGPKaPcx
Whether it's safe to assume in general that float equality is unlikely for the purpose of program optimization, I'll leave to the compiler engineers. If you know the data your program will be handling, adding hints will avoid these surprises.

by toonewbie

0 subcomment

Nice read. Last week I wrote a blog post about two noteworthy cases of O3 being slower than O2 in C++: https://barish.me/blog/cpp-o3-slower/

by fifilura

1 subcomments

It seems to me that if there is one way that is always faster it would hardly deserve a secret setting, it should be the default?

by YesBox

3 subcomments

by anematode

0 subcomment

Great post.
I wonder whether profile-guided optimization would have led LLVM to select a better (or worse) decision.

by cat_plus_plus

3 subcomments

As a denser gas, Ozone would have greater friction getting through small pores, so that would be one example?

by pclmulqdq

6 subcomments

-O3 -march=haswell
The second one is your problem. Haswell is 15 years old now. Almost nobody owns a CPU that old. -O3 makes a lot of architecture-dependent decisions, and tying yourself to an antique architecture gives you very bad results.

by johnisgood

11 subcomments

Off-topic, but I seriously dislike the syntax of Rust. It is chaotic and mind-boggling to me. See the "insert" function.
Good post though.