It's a nice piece of work. I especially like the sections on data cleaning and registration, as that seemed to have been one of the limiting factors of the previous approaches.
I am sceptical about how accurately you can predict heights for specific trees from mono-images, but I think for cases where you just need to be right on average (e.g. biomass estimation, fuel load estimates) it's a great approach.
The blog post and paper [1] describe a promising approach to solving related problems at previously impossible scale and quality: I am currently exploring methods to better represent seasonal land cover changes that would improve wind power generation forecasting and this paper provides a great starting point.
I hope DINOv3 can inspire more work like this - and I would encourage any curious mind to play with that model! I was amazed by its capability to distinguish between fine object details. For example, in a photo of a bicycle, the patch embeddings cleanly separated the background from the individual spokes of the wheel.
Here are the visuals re: trees - https://i.imgur.com/R0W4q4O.png