It only supports consecutive instructions in the innermost loops. It can't include nor even ignore any setup/teardown cost. This means I can't feed any function as-is (even a tiny one). I need to manually cut out the loop body.
It doesn't support branches at all. I know it's a very hard problem, but that's the problem I have. Quite often I'd like to compare branchless vs branchy versions of an algorithm. I have to manually remove branches that I think are predictable and hope that doesn't alter the analysis.
It's not designed to compare between different versions of code, so I need to manually rescale the metrics to compare them (different versions of the loop can be unrolled different number of times, or process different amount of elements per iteration, etc.).
Overall that's laborious, and doesn't work well when I want to tweak the high-level C or Rust code to get the best-optimizing version.
E.g. Cortex-A72 uses the Cortex-A57 model, as does Cortex-A76, even Cortex-A78.
The neoverse V1 model has an issue width of 15, meanwhile the neoverse V2 (and V3, which uses V2) has an issue width of 6.