FRESH

Hacker News

Knowledge Distillation of Black-Box Large Language Models (2024)

122 points by babelfish

by dmezzetti

0 subcomment

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Related paper that's a good read: https://arxiv.org/abs/1908.08962

by Alifatisk

1 subcomments

by phantompeace

0 subcomment

Considering the very small difference between just SFT on the student model as compared to SFT + DPO on a proxy, doesn't it make sense to concentrate on ensuring the SFT dataset is perfect rather than sorry about DPO etc? And just train directly on the student model?

0 subcomment

by potus_kushner

0 subcomment

probably more interesting (from 01/2026) https://arxiv.org/pdf/2511.10643 "Black-Box On-Policy Distillation of Large Language Models". they got a qwen 2.5 14B model trained to GPT5 level using the described technique "Generative Adversarial Distillation (GAD)".

by StreamCtx

1 subcomments

“Relevant to anyone building failure-attribution systems for agent pipelines — black-box distillation techniques here could feed into causal attribution models without needing white-box access to the underlying model.”

by duendefm

6 subcomments

The Chinese are really going strong on destroying the American AI economy bubble. Honestly, despite the fact that I'm totally pro USA and anti China, I think we should help them crashing the American AI bubble. They are controlling everything and we can't even buy a new computer nowadays while getting no benefit from this. I wish some influential programmers stimulated coders everywhere to skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

by linolevan

0 subcomment

by spacebacon

0 subcomment

by TimXare

0 subcomment

0 subcomment

by modgate

0 subcomment

by LNSY

1 subcomments