Open-weight models with strong multilingual support change the math because you can self-host at marginal cost once you have GPU capacity. DeepSeek's earlier versions already punched above their weight on non-English benchmarks (especially CJK and some Indic languages where the gap to GPT-4 was much narrower than English-only benchmarks suggested).
Two questions for anyone who's actually deployed V4 in production yet:
1. How does it handle Turkish / Slavic morphology compared to V3? In our tests V3 was solid for Russian and respectable for Turkish, but handled compound morphology in agglutinative languages a bit awkwardly.
2. Is the long-context window actually usable end-to-end or does quality degrade past ~64k like with most open models?
Having said that I really hope this model of deepseek, performs significantly on par with the claude saunnet model.