The benchmark site is here https://www.idp-leaderboard.org/
They say some specialist models get better results on their benchmarks (Nanonets OCR-3 85.9%)
The intelligent document processing (a funny marketing term on top of OCR) market moves from "Can software extract the text", which is normally measured by benchmarks, to can software autonomously run "a" specific company process.
the fallback is called human in the loop, hallucination (LSTM vs. vLLM), prompt engineering.
proof me wrong: the hardest challenge is no longer the OCR accuracy but the integration and issue handling in production. Probably "an agentic team can handle this" ^^
I've been using Qwen3.6 to OCR stuff, primary receipts and it frequently accurately reads stuff on mangled/faded/folded documents that I have a hard time with... including handwritten stuff (though that's not flawless).