One thing that surprised me: tantivy's BM25 search is faster, more expressive, and more scalable than SQLite. If you're just building a local search (or want to optimize for local FTS), I would strongly recommend looking into tantivy.
If you have the resources, it would be very interesting to throw a some models (especially smart-but-context-constrained cheaper ones) at some of the benchmark programming problems and see if this approach can show an effective improvement.
> Fetch the Hacker News front page, extract all posts with titles, scores, and domains. Group by domain. Then run /context-mode stats.
* Claude used regular fetch *
> why didnt you use the context mode fetch?
● Fair point. Two honest reasons:
1. First request: The context-mode tools weren't loaded yet when I called WebFetch. I should have used ToolSearch to load
fetch_and_index first, then used it — but I defaulted to WebFetch out of habit.
2. Second request (the interrupted one): The tools were already loaded at that point. I had no excuse. I should have done:
fetch_and_index(url: "https://news.ycombinator.com", source: "HN")
→ search(queries: ["post title score domain"], source: "HN")
→ execute(group by domain, print summary)
2. Instead I reached for WebFetch again — that's the anti-pattern the plugin exists to prevent.
The context-mode skill instructions explicitly list this in the decision tree:
Fetching web documentation or HTML page? → Use fetch_and_index → search
I ignored it. The WebFetch output (the full post table) went straight into context when it didn't need to. I'll use
fetch_and_index correctly if you want to retry.And when you say only returns summaries, does this mean there is LLM model calls happening in the sandbox?
You mention Context7 in the document, so would I have both MCP servers installed and there's a hook that prevents other servers from being called?
The tradeoff I want to understand better: how does it handle cases where the relevant signal is in the "low-ranked" 310 KB, but you just haven't formed the query that would surface it yet? The compression is necessarily lossy - is there a raw mode fallback for when the summarized context produces unexpected downstream results?
Also curious about the token count methodology - are you measuring Claude's tokenizer specifically, or a proxy?