The C++11 threadsafety guarantee on static initialization is explicitly scoped to block local statics. That's not an implementation detail, that's the guarantee.
The __cxa_guard_acquire/release machinery in the assembly is the standard fulfilling that contract. Move to a private static data member and you're outside that guarantee entirely. You've quietly handed that responsibility back to yourself.
Then there's the static initialization order fiasco, which is the whole reason the meyers singleton with a local static became canonical. Block local static initializes on first use, lazily, deterministically, thread safely. A static data member initializes at startup in an order that is undefined across translation units. If anything touches Instance() during its own static initialization from a different TU, you're in UB territory. The article doesn't mention this.
Real world singleton designs also need: deferred/configuration-driven initialization, optional instantiation, state recycling, controlled teardown. A block local static keeps those doors open. A static data member initializes unconditionally at startup, you've lost lazy-init, you've lost the option to not initialize it, and configuration based instantiation becomes awkward by design.
Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.
https://compiler-explorer.com/z/Tsbz7nd44
This is about constant vs dynamic initialization, not trivial vs nontrivial default construction. To be fair, the article doesn't claim this, but that's the comparison being made.
The standard allows to optimize away dynamic initialization, but AFAIK there are ABI implications of doing that, so compilers tend to not do that.
If you absolutely want to guarantee that a global is constant initialized, use "constinit" on the variable declarations too. It can also have some positive codegen effects on declarations of thread_locals.
Focusing on micro-"optimizations" like this one do absolutely nothing for performance (how many times are you actually calling Instance() per frame?) and skips over the absolutely-mandatory PROFILE BEFORE YOU OPTIMIZE rule.
If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
I ended up using std::call_once for those cases. More boilerplate but at least you're not debugging init order at 2am.
auto& s = DisplayManager::Instance();
s.SetResolution(Resolution::r640x480);
...just this: Display::SetResolution(Resolution::r640x480);
...since it's a singleton, the state only exists once anyway so there's no point in wrapping it in an object.E.g. what's the point of globally visible singletons except "everything is an object" cargo-culting?