- My current holy grail is my attempt to convert a Shipibo (an indigenous Peruvian language)-to-Spanish dictionary into a Shipibo-to-English dictionary. The pdf I have (available freely on archive.org) isn't a great scan (though I think it'd be a heck of a lot easier than some of the handwritten examples they show). Layout (2-columns) along with header/footers can cause some headaches, but it is all Latin script. This seems to fall on its face pretty badly (not even a couple of pages in), so my search continues. (The other major problem I'm having is trying to separate out Shipibo definitions/examples from the Spanish ones, and only translating the Spanish to English...so pretty complex I guess. I've been taking fresh stabs at this project every few months when I see OCR/LLM news pop up and continue to be disappointed)
by vintermann
4 subcomments
- I appreciate having an OCR interface rather than having to chat with a bot, but unfortunately chatting with Gemini 3 gives far better results than this. I gave it the document Gemini 3 got a surprisingly good result on:
https://urn.digitalarkivet.no/URN:NBN:no-a1450-rk10101508282...
and the output wasn't even recognizably Danish.
Just out of pity I gave it a birthday card from my sister written in very readable modern handwriting, and while in managed to make the contents of that readable, the errors it made reveals that it has very little contextual intelligence. Even if ! and ? can be hard to tell apart sometimes, they weren't here, and you do not usually start a birthday letter with "Happy Birthday brother?"
by GZGavinZhao
3 subcomments
- Does it handle math expressions (those rendered from LaTeX) well? I've been looking for a good OCR model to transcribe my math textbooks into markdown (obviously ignoring the images and figures) with LaTeX as math expressions, and none of the current OCR models work reliably enough.
EDIT: you can try it yourself for free at https://console.mistral.ai/build/document-ai/ocr-playground once you create a developer account! Fingers crossed to see how well it works for my use case.
- It seems like Mistral is just chasing around sort of "the fringes" of what could be useful AI features. Are they just getting out-classed by OAI, Google, Anthropic?
It seems like EU in general should be heavily invested in Mistral's development, but it doesn't seem like they are.
- From a tweet: https://x.com/i/status/2001821298109120856
> can someone help folks at Mistral find more weak baselines to add here? since they can't stomach comparing with SoTA....
> (in case y'all wanna fix it: Chandra, dots.ocr, olmOCR, MinerU, Monkey OCR, and PaddleOCR are a good start)
by hereme888
2 subcomments
- I'm reading worse performance than many OSS offerings like Paddle, MinerU, MonkeyOCR, etc:
https://www.codesota.com/ocr
- Gave it a birth registry from a Portuguese locality from 1755 which my dad and I often decipher to figure out geneology and it did a terrible job.
Regular Gemini Thinking can actually get 70-80% of the documents correct except lots of mistakes on given names. Chatgpt maybe understands like 50-60%.
This Mistral model butchered the whole text, literally not a word was usable. To the point I think I'm doing something wrong.
The test document: https://files.fm/u/3hduyg65a5
- Sadly, only available through a hosted API. I don't see how this is useful for OCR, unless you are OK with uploading your confidential documents to "the cloud"?
I'm still hoping for improved locally hosted models: qwen3-vl:30b-a3b-thinking-q4_K_M is already really good.
by tecoholic
2 subcomments
- > Mistral OCR 3 is ideal for both high-volume enterprise pipelines and interactive document workflows.
I don’t know how they can make this statement with 79% accuracy rate. For any serious use case, this is an unacceptable number.
I work with scientific journals and issues like 2.9+0.5 and 29+0.5 is something we regularly run into that has us never being able to fully trust automated processes and require human verification every step.
- there has been so many open source OCR in the last 3 months that would be good to compare to those especially when some are not even 1B params and can be run on edge devices.
- paddleOCR-VL
- olmOCR-2
- chandra
- dots.ocr
I kind of miss there is not many leaderboard sections or arena for OCR and CV and providers hosting those. Neglected on both Artificial Analysis and OpenRouter.
- So I tried this on the NVMe specification (I have a huge library of PDFs) and it worked decently, though the output had some oddities:
- Parts of the table of contents were headings
- I didn't like how tables were links to separate markdown files.
In theory, I could recombine everything into one document, but that would require complicated Markdown parsing and manipulation and I wasn't even sure how to go about that given how free-form the resulting text was. I also haven't gone through the entire document (it's 784 pages) to check to make sure it's correct compared to what pdftotext or acrobat could create, so there's that too.
- I am testing it as a replacement of MathPix, first few tests look rather decent. In python for windows: https://pastebin.com/uyiFHKdJ (alpha version prototype). Launches windows snip tool, waits for clipboard image, calls Mistral, retrieves markdown and puts it as text in the clipboard, ready to be pasted in Typora, Obsidian, or other markdown editor.
- This might be a good place to check the options available for OCR in-place translations. I took a look at OCR3, but it doesn't seem to support my use-case. It looks more tailored towards data extraction for further processing.
I've got some foreign artbooks that I would like to get translated. The translations would need to be in place since the placement of the text relative to the pictures around it is fairly important. I took a look at some paid options online, but they seemed to choke - mostly because of the non-standard text placements and all.
The best solution I could come up with is using Google Lens to overlay a translation while I go through the books, but holding a camera/tablet up to my screen isn't very comfortable. Chrome has Lens built in, but (IIRC) I still need to manually select sections for it to translate - it's not as easy to use as just holding my phone up.
Anyone know of any progress towards in-place OCR/translations?
by singularity2001
1 subcomments
- No one mentioning the possibly most beautiful css effect on the Internet??
by i_am_not_groot
0 subcomment
- Finally a way to read doctor's prescriptions
- Is open router still sending all OCR jobs to Mistral? I wonder if they're trying to keep that spot. Seems like Mistral and Google are the best at OCR right now, with Google leading Mistral by a fair bit.
- My main beef with mistral is that they don’t bother to respond to customer inquiries for products the hide behind “reach out for pricing” terms, so even if they were better than SoTA it wouldn’t really matter.
- I need solresol in any language. It are constructed for discusion and negotiation on war
- What languages does it support? I can't find this info anywhere on the page.
by constantinum
0 subcomment
- At instances where data accuracy is of paramount importance, i think a hybrid route of non-llm ocr for data parsing and LLMs for structured data extraction is the safe passage to tread on. Seen better results for LLMWhisperer(OCR)[1] and Latest Gemini.
[1] - https://pg.llmwhisperer.unstract.com/
by singularity2001
0 subcomment
- Not OS / free weights right?
- Can we have an open source tool that uses the same API, and that you can just instruct to use Mistral or any other service if you think the open source tool has quality issues for a particular text?
This makes more sense to me, as I find that FOSS OCR is quite okay for most usecases.
by awaymazdacx5
0 subcomment
- [dead]
- [dead]
- [flagged]