Alleged Distillation Attacks by DeepSeek, Moonshot AI, and MiniMax
26 points by mike_kamau
by m4rtink
0 subcomment
Oh no! They are stealing all the data we have stolen ourselves! This needs to be stopped and punished immediately!
by credit_guy
1 subcomments
I don't think this counts as distillation. Distillation is when you use a teacher model to train a student model, but crucially, you have access to the entire probability distribution of the generated tokens, not just to the tokens themselves. That probability distribution increases tremendously the strength of the signal, so the training converges much faster. Claude does not provide these probabilities. So, Claude was used for synthetic training data generation, but not really for distillation.
by exq
1 subcomments
So it's okay when big American corps raid the internet ignoring any terms of service or licenses they see in order to train models they rent back to us, but when a foreign entity trains off of Anthropic it's illegal?
by saberience
1 subcomments
Pot, meet kettle!
I don’t think I’m the only one feeling some schadenfreude at this news. I suppose it’s ok when you’re a hot Silicon Valley scale-up to slurp up the rest of the worlds data for free and then hire hot shot lawyers to defend you against all the creatives you ripped off, but when it’s the “evil” Chinese doing the same to you it’s a dastardly “attack”?