I’ve also heard very good things about these two in particular:
- LightOnOCR-2-1B: https://huggingface.co/lightonai/LightOnOCR-2-1B
- PaddleOCR-VL-1.5: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5
The OCR leaderboards I’ve seen leave a lot to be desired.
With the rapid release of so many of these models, I wish there were a better way to know which ones are actually the best.
I also feel like most/all of these models don’t handle charts, other than to maybe include a link to a cropped image. It would be nice for the OCR model to also convert charts into markdown tables, but this is obviously challenging.
And here's the kicker. I can't afford mistakes. Missing a single character or misinterpreting it could be catastrophic. 4 units vacant? 10 days to respond? Signature missing? Incredibly critical things. I can't find an eval that gives me confidence around this.
EDIT: https://github.com/overcuriousity/pdf2epub looks interesting.