There is too much heuristic fiddling involved, and way too many niche algorithms that get cobbled together with an optimiser.
As if we're missing the theory to actually solve the stuff, so we're instead hobbling along by covering as many corner cases as we can, completely missing some elegant and profound beauty.
On paper, join order is a combinatorial search over equivalent expressions. In reality, you’re optimizing over three very non-relational constraints: data distribution (where bytes actually live), cardinality estimation (how wrong your stats are), and memory/network contention (what everyone else is running). That’s why so many OLAP setups quietly give up and denormalize: not because joins are conceptually hard, but because getting good enough plans under bad stats and skewed data is brutally hard and very user-visible when it fails.
What’s interesting about systems like StarRocks, ClickHouse, DuckDB, etc is that they’re implicitly making a bet: “we can push the optimizer and execution engine far enough that normalized schemas become operationally cheaper than the hacks (wide tables, pre-joined materializations, bespoke streaming DAGs).” If that bet holds, the real win isn’t just faster joins, it’s shifting complexity back from application-specific pipelines into a general-purpose optimizer that can be improved once and benefit everyone.
The irony is that the more powerful the optimizer, the more your “logical” schema becomes a performance API surface. A small change in constraints, stats collection, or distribution keys can be worth more than any new feature, but it’s also harder to reason about than “this table is pre-joined.” So we’re trading one kind of complexity (manual denormalization and backfills) for another (making the cost model and distribution-aware planner smart enough to not shoot you in the foot).