FRESH

Hacker News

Microsoft researchers developed a hyper-efficient AI model that can run on CPUs

144 points by libpcap

by hu3

2 subcomments

Repo with demo video and benchmark:
https://github.com/microsoft/BitNet
"...It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption..."
https://arxiv.org/abs/2402.17764

by ilrwbwrkhv

4 subcomments

This will happen more and more. This is why NVidia is rushing to get CUDA a software level lock-in otherwise their stock will go the way of Zoom.

by zamadatix

2 subcomments

"Parameter count" is the "GHz" of AI models: the number you're most likely to see but least likely to need. All of the models compared (in the table on the huggingface link) are 1-2 billion parameters but the models range in actual size by more than a factor of 10.

by Jedd

2 subcomments

by ein0p

3 subcomments

This is over a year old. The sky did not come down, everyone didn't switch to this in spite of the "advantages". If you look into why, you'll see that it does, in fact, affect the metrics, and some more than others, and there is no silver bullet.

by stogot

0 subcomment

by falcor84

5 subcomments

Why do they call it "1-bit" if it uses ternary {-1, 0, 1}? Am I missing something?

by nodesocket

1 subcomments

There are projects working on distributed LLMs, such as exo[1]. If they can crack the distributed problem fully and get performance it’s a game changer. Instead of spending insane amounts on Nvidia GPUs, can just deploy commodity clusters of AMD EPYC servers with tons of memory, NVMe disks, and 40G or 100G networking which is vastly less expensive. Goodbye Nvidia AI moat.
[1] https://github.com/exo-explore/exo

by justanotheratom

2 subcomments

by esafak

1 subcomments

by instagraham

1 subcomments

> it’s openly available under an MIT license and can run on CPUs, including Apple’s M2.
Weird comparison? The M2 already runs 7 or 13gb LLama and Mistral models with relative ease.
The M-series and Macbooks are so ubiquitous that perhaps we're forgetting how weak the average CPU (think i3 or i5) can be.

by 1970-01-01

0 subcomment