FRESH Hacker News
Home
Lossless LLM compression for efficient GPU inference via dynamic-length float
347 points by CharlesW