FRESH

Hacker News

Home

Show HN: Context Mode – 315 KB of MCP output becomes 5.4 KB in Claude Code

69 points by mksglu

by i3oi3

0 subcomment

Interesting approach. I just finished some work for a similar task in a different domain.
One thing that surprised me: tantivy's BM25 search is faster, more expressive, and more scalable than SQLite. If you're just building a local search (or want to optimize for local FTS), I would strongly recommend looking into tantivy.
If you have the resources, it would be very interesting to throw a some models (especially smart-but-context-constrained cheaper ones) at some of the benchmark programming problems and see if this approach can show an effective improvement.

by wobblywobbegong

0 subcomment

Interesting approach, I tried the Hackernews example from the docs, but its tools don't seem to trigger reliably. Any suggestions?

> Fetch the Hacker News front page, extract all posts with titles, scores, and domains. Group by domain. Then run /context-mode stats.

* Claude used regular fetch *

> why didnt you use the context mode fetch?

● Fair point. Two honest reasons:

  1. First request: The context-mode tools weren't loaded yet when I called WebFetch. I should have used ToolSearch to load    
  fetch_and_index first, then used it — but I defaulted to WebFetch out of habit.                                              
  2. Second request (the interrupted one): The tools were already loaded at that point. I had no excuse. I should have done:   
  fetch_and_index(url: "https://news.ycombinator.com", source: "HN")                                                           
  → search(queries: ["post title score domain"], source: "HN")
  → execute(group by domain, print summary)
  2. Instead I reached for WebFetch again — that's the anti-pattern the plugin exists to prevent.

  The context-mode skill instructions explicitly list this in the decision tree:

  Fetching web documentation or HTML page? → Use fetch_and_index → search

  I ignored it. The WebFetch output (the full post table) went straight into context when it didn't need to. I'll use
  fetch_and_index correctly if you want to retry.

by handfuloflight

2 subcomments

One moment you're speaking about context but talking in kilobytes, can you confirm the token savings data?
And when you say only returns summaries, does this mean there is LLM model calls happening in the sandbox?

by gavinray

0 subcomment

Im not sure i understand how it coexists with existing installed MCP servers
You mention Context7 in the document, so would I have both MCP servers installed and there's a hook that prevents other servers from being called?

by vicchenai

1 subcomments

The BM25+FTS5 approach without LLM calls is the right call - deterministic, no added latency, no extra token spend on compression itself.
The tradeoff I want to understand better: how does it handle cases where the relevant signal is in the "low-ranked" 310 KB, but you just haven't formed the query that would surface it yet? The compression is necessarily lossy - is there a raw mode fallback for when the summarized context produces unexpected downstream results?
Also curious about the token count methodology - are you measuring Claude's tokenizer specifically, or a proxy?

0 subcomment

by rcarmo

1 subcomments

Nice trick. I’m going to see how I can apply it to tool calls in pi.dev as well

by robbomacrae

1 subcomments

Really cool. A tangential task that seems to be coming up more and more is masking sensitive data in these calls for security and privacy. Is that something you considered as a feature?

by sim04ful

1 subcomments

Looks pretty interesting. How could i use this on other MCP clients e.g OpenCode ?

by MarcLore

0 subcomment

[dead]

by YaraDori

0 subcomment

[dead]