FRESH

Hacker News

Advanced Quantization Algorithm for LLMs

133 points by lastdong

by trilogic

1 subcomments

You can try it with this model here: https://hugston.com/models/56tps-tested-autoround-qwen35-35b... which is really well done and can run pretty fast with ctx up to 300k. Just 11.65 GB. Get the Mmproj also for vision/image processing.

by programjames

0 subcomment

Anyone willing to dig through the code or papers for the actual algorithm? It looks like the GitHub and papers have not been optimized for communication.

by liuliu

1 subcomments

I am actually getting interested in QAT these days, especially for LSQ+ type, but it doesn't seem like people have done that enough in open-source world at least, for 2-bit / 3-bit OPD with LSQ+ basically.

by netdur

6 subcomments

hmm... at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy, AutoRound pushes that to ~99.4–100.n% (??) the gap is roughly 0.1–0.7 percentage points
https://github.com/intel/auto-round/blob/main/docs/gguf_alg_...

by potter098

0 subcomment

[flagged]