Was looking at modifying outgoing requests via proxy and wondering whether that's harming caching. Common coding tools presumably have a shared prompt across all their installs so universal cache would save a lot
Even just moving it to the bottom helped move a lot of our usage into cache.
Probably went from something like 30-50% cached tokens to 50-70%.
So if I were running a provider I would be caching popular prefixes for questions across all users. There must be so many questions that start 'what is' or 'who was' etc?
Also, can subsequences in the prompt be cached and reused? Or is it only prefixes? I mean, can you cache popular phrases that might appear in the middle of the prompt and reuse that somehow rather than needing to iterate through them token by token? E.g. must be lots of times that "and then tell me what" appears in the middle of a prompt?
It's a pain having to tell Copilot "Open in pages mode" each time it's launched, and then after processing a batch of files run into:
https://old.reddit.com/r/Copilot/comments/1po2cuf/daily_limi...
https://t3.chat/share/j2tnfwwful https://t3.chat/share/k1xhgisrw1
[see https://news.ycombinator.com/item?id=45988611 for explanation]