FRESH

Hacker News

Home

60% Fable cost cut by converting code to images and having the model OCR it

304 points by dimitropoulos

by aabhay

3 subcomments

In Gemini at least, if you look at how they process PDFs, they do an OCR and then feed the text + image to the model, without charging you for the text tokens (I believe).
So my guess is that Claude’s backend is doing the same — so this hack is probably more of a loophole in token accounting that might get closed if Claude is doing what Gemini does

by lpellis

0 subcomment

I tried the same thing last year (with openai models), back then it worked to reduce prompt tokens, but you needed way more completion tokens, ultimately more expensive (and slower) https://pagewatch.ai/blog/post/llm-text-as-image-tokens/

by aabhay

3 subcomments

Ahhh my eyes the vibe coded readme

by genxy

2 subcomments

This seems like a pricing hack that burns resources, that when the loophole gets closed the price of OCR will have to rise?

by himata4113

0 subcomment

Related: https://blog.can.ac/2026/06/10/snapcompact/

by g42gregory

0 subcomment

I think Oh-My-Pi (OMP.sh) uses images for context compactificaton. OMP is built on top of Pi coding agent.

by dimitropoulos

0 subcomment

there's also a DeepSeek whitepaper on this technique https://www.seangoedecke.com/text-tokens-as-image-tokens

by troglodytetrain

0 subcomment

So, just be careful with this, it very likely is switching to other less capable model hence the cost reduction. So looks like Fable but isn’t. So you are doing extra work when you could just switch the model back to opus 4.8 instead.

by OSaMaBiNLoGiN

1 subcomments

Saw a Tweet a while ago from someone (maybe Carmack, maybe Geohot, maybe Karpathy?) wondering if images were just the better option.
Since then I've been using images with very simply worded prompts whenever I'm informing an agent of what is happening. Sometimes no text in the prompt at all.
It has been very very effective.
That being said, this isn't really what Karpathy was talking about. But it got me thinking a bit, and that got me to a much nicer workflow.

by anigbrowl

0 subcomment

I'm sorry, but this is retarded. It works, and it's clever, but but it's clearly a workaround for a pricing failure. Much like the bounty on poisonous snakes leading to people taking up snake-breeding, this just exploits and promotes waste. I think ultimately blame falls on Anthropic for the poor pricing system the enables such arbitrage. But I'm also disgusted by the inevitable tide of people exploiting this until its fixed, and creating an entirely unnecessary extra tide of digital junk.

0 subcomment

by brumar

3 subcomments

Tangentially related: I don't think OCR is the right term and I am generally vocal about that. But seeing this unquestioned here, I am wondering if I am the one who is wrong here. Is it ok to call this OCR? To me ocr means text in the end, not visual tokens.

by __hugues

1 subcomments

seems really dumb and like it would need to violate basic information theory to work?
input tokens are cheaper than output tokens. seems like it would maybe reduce input tokens at the expense of many more output tokens if you're actually triggering OCR via thinking?

by cs702

0 subcomment

Reminds me of caveman: https://news.ycombinator.com/item?id=47647455

by electrotype

1 subcomments

What about: "Read this document online : [URL]" and you add your text/context to an online document?
Would that reduce the number of tokens used too?

by tru3_power

0 subcomment

This probably works with PDF parsing as well I’m sure, even if it’s just from not having to parse pdf format alone.

by puppycodes

0 subcomment

That is hilarious and an amazing find.

by chickensong

0 subcomment

Binary compression unpacked by OCR? This is the stuff of nightmares. So cursed, and yet...

by npn

0 subcomment

it is funny because nobody ever bother points out that they overcharge you for text input token price.
sure it was pretty resource intensity a few years before, but with turbo quant, sparse attention and various techniques, plus the advancing of hardware (dedicated prefill machine, memory pool for kv caching) the cost should be drastically reduced, and yet they still keep the same cost formula.
I can't help but laugh whenever someone proudly share how many billion input tokens they spent in their code sections and how much they saved with the subscription, meanwhile it is pretty much just electricity cost for the providers.

by shinryuu

0 subcomment

Interesting approach, though that readme really needs a rewrite by a human...

by dippogriff

1 subcomments

I want to see more text-free foundation models

by nickpeterson

0 subcomment

a pictures worth a thousand tokens

by lstroud

1 subcomments

Are we really re-discovering that compressed binary formats are more efficient data representations?

by yogthos

0 subcomment

Isn't this basically what DeepSeek came up with https://github.com/deepseek-ai/DeepSeek-OCR

by wigster

0 subcomment

a picture paints a thousand words

by felipelalli

0 subcomment

No words.

by AIorNot

0 subcomment

I cant get past that LLM intense slop text in the Github repo

by TokenLens

0 subcomment

[flagged]

by colwont

0 subcomment

[dead]

by tomyow

0 subcomment

[flagged]

by tomfow

0 subcomment

[flagged]

by tomdow

0 subcomment

[flagged]