FRESH

Hacker News
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
14 points by jchandra
by vivahir215
1 subcomments
- Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
- [dead]