FRESH

Hacker News

Home

GPT‑NL: a sovereign language model for the Netherlands

248 points by root-parent

by armcat

11 subcomments

I keep seeing these "sovereign" LMs time and time again. In Sweden we had GPT-SW3 (https://www.ai.se/en/project/gpt-sw3) and same story there. Instead of burning money on "sovereign" claims, national research labs should instead focus on building on top of solid baselines (like Qwen/Kimi) and finetuning frontier models with real agentic utility that can be applied across actual use cases and can be widely used by its people, basically for free. Nations should mirror what Cursor has done with Composer 2.5 for example.

by sublimefire

5 subcomments

It is crazy that anything Europe gets so much hate. IMO it is important to build models within the boundaries of smaller nations, using their own language. Research has to continue even if it is outside of US and China.

by WarmWash

6 subcomments

If Europe is serious about getting home grown AI fast, three simple steps:
1. Huge tax incentives, let the companies get grossly wealthy while paying minimal taxes. Minimum 10 years with clauses protecting "retribution" taxes there after.
2. Tax incentives for the founders/shareholders, just like above.
3. Drop worker protections to a minimum, make it easy to fire people. You only want serious/dedicated employees anyway.
Within 2-3 years there will be at least a trillion dollars looking to get in.
Don't worry though if reading that made you mad. Its absolutely not going to happen. I can think of few things more antithetical to the European ethos than smart skilled people working 80-100hrs weeks with almost no vacation to gas their founders net worth by tens, hundreds, of billions.

by siva7

2 subcomments

> GPT‑NL is developed within the Netherlands and Europe. This gives us full control over the model, the data and the choices we make. We avoid dependency on non‑European providers and invest in a sustainable AI ecosystem aligned with our laws, values and societal goals.
I love it! So this is our answer to America and China denying foreigners access to their frontier models.. a massive 13,5M€ founding to develop souvereign european ai, trained exclusively on legally obtained documents and highest moral standards as defined in EU AI Act.

by matheusmoreira

4 subcomments

So good to see these developments. Every country should do this. I'd even say every person should gave their own personalized AI running on their own computers. If only the costs involved were not so astronomical.

by bmenrigh

2 subcomments

I think at this point what the Netherlands, and any other country that wants a good model in their language should do, is gather up every piece of text ever written in that language and license it to the big AI labs/companies for training. I'm sure there are vast libraries of books and other text that haven't been digitized and aren't a priority for the big labs.

by rollulus

2 subcomments

Interesting that this got posted now: the project is receiving increasingly more skepticism lately in the Dutch tech scene [0], and I think that’s fully justified.
[0]: https://www.quotenet.nl/zakelijk/a71588202/techondernemers-m...

by dwa3592

8 subcomments

I don't understand countries (especially governments) wanting to have their own models when there are already pretty solid open source (weights) models out there.
Countries should want control over _where_ the compute is happening rather than _what code_ is running.
What's wrong with a country hosting a Kimi, Qwen or GPT-Oss on their hardware for their government work purpose?

by HelloUsername

1 subcomments

Previously posted on 02-dec-2023 https://news.ycombinator.com/item?id=38497495 3 comments

by wrs

1 subcomments

They’re building a competitive-quality model, from scratch, with fair compensation to content owners, for €13.5 million? Something’s wrong with this picture.

by stared

8 subcomments

I feel that not only is Europe losing its independence to the US and China, but it does not even try to take part in the race.
Unlike the US, Europe has no California-level VCs. I don't expect hundreds of billions of Euros to be poured into long-shot projects.
Unlike China, Europe has neither cohesive public investment at the global level nor the drive to grow. Long-term investments have a lot of words, a lot of regulations, a lot of proxy goals, but there is neither a lot of money nor urgency. It was captured by this post: https://x.com/piotrsankowski/status/2065795919623438546
So yeah, both in economy and warfare, Europe dooms itself to be in the hands of the US, China, or a mix of both.

by Aeolun

1 subcomments

A total of €13.5M has been allocated to the project.
I guess we’re going for GPT2 level capability?

by jansenmac

1 subcomments

This is not an open source model. In that sense I think the sovereign claim is a bit strange. It's the data providers that determine access to the model.

by thatguymike

1 subcomments

> A total of €13.5 million has been allocated to the project.
> This public investment underlines the importance of an independent, trustworthy and future‑proof Dutch language model.
It does, but not in the way you think it does.

by alper

0 subcomment

Europe should have a sovereign model on its content and languages that is trained with renewable energy and published as open source.
This looks like a good step in that direction.

by wolvoleo

0 subcomment

We already had GEITje but it was banned by the courts. Of course it can still be found because the entire internet is not subject to Dutch law. But it did manage to stop development :'(

by sarjann

0 subcomment

I wonder with these stories. Why are there so many individual country efforts? We know the scale needed with scaling laws / capital / energy. Most of these countries alone can barely compete (even large groups of them would struggle.
Why don't they work together on it? Companies like Airbus have already been able to do that with aircraft.

by gnegggh

0 subcomment

I'm making a Dutch dictionary and would be interested to see how this model would fair in evals vs non specialized ones. I've tested a variety of models for https://hetnederlands.com content and differences can be big

by mvdh1304

0 subcomment

overall, the revenue sharing model is (IMO) more interesting than the fact that it is dutch. Usage of data, and sharing it with the providers of this data, is an inherent part of the creation of these models that is not discussed as much as it should be

by Dwedit

0 subcomment

What really matters is the sovereign capability to finetune the LLM models. Any model could be vetted and tested, but you need finetuning/lora training to prevent the model from being outdated.

by stared

0 subcomment

Is it a proposal or a model? And if it is a model, how fies it fare on benchmarks?

by jurschreuder

2 subcomments

What are they going to train with 13.5M really? We're a tiny company in Amsterdam in Holland and we've got "only 64x B300 to train on" so we could never make an LLM I thought, since we've got only 4M in compute.
And they're going to train an LLM with all kinds of extra difficulties compared to OpenAI for just 13.5M?
The very first Llama was 16M for one training.

by jgbuddy

0 subcomment

I fear sovereignty is not a adoption-driving feature

by lejeanvaljean

0 subcomment

Better work on something at Europe level

by jdw64

0 subcomment

Honestly, I used to think the 'sovereign model' was a waste of money. But recently, with the US logic of restricting model exports, I've come to think that if things go south, they could even cut off allied nations. So now the sovereign model seems reasonable to me. That, in turn, means US influence is deteriorating. And that probably isn't such great news for American businesses.

by Zababa

0 subcomment

I feel like building datacenters and filling them with chips may be more valuable than creating sovereign models. xAI I think makes more money renting datacenters to Anthropic than with the models they trained, and they could pivot thanks to their datacenters. By making regulations easier than in the US, this could bring some computing power to Europe, which then can be used to train sovereign models, or rented to big AI labs.
Also, when training models, you create talent that then could go to other countries (brain drain). Restricting that brain drain without imposing authoritarian restrictions on the movements of people seems hard, so it seems hard to keep talent as a competitive advantage. If instead the competitive advantage is datacenters with chips, power capacity building, fast path to building datacenters, I think they are easier to retain while preserving the rights of everyone involved.

by rdwrrr

1 subcomments

Burning tax money. I dare to bet this will never lead anywhere.

by debarshri

0 subcomment

So cute.

by simianwords

0 subcomment

I really think countries should build a sovereign _ecosystem_ and sovereign models are an excuse to achieve it.
An ecosystem is the tribal knowledge, revolving door of talent, known processes etc.
If the end goal is to make a half assed Dutch speaking model, I think it won’t cut it. I don’t see anyone using it over Gemma 4b that runs on my laptop.
An ecosystem is more durable and has desirable second order effects.

by dr_dshiv

0 subcomment

How do you use it?

0 subcomment

by Marciplan

1 subcomments

Supposedly this model also aims to treat publishers of all sizes well. Looking forward to its launch soon :)

by agrijakhetarpal

0 subcomment

"sOvErEiGn"

by rahimnathwani

0 subcomment

[dead]

by GreenSalem

2 subcomments

[flagged]

by mvanbaak

1 subcomments

> Excluding harmful content
#define(HARMFUL)
[edit] Downvoters please tell me what the problem is with specifying this?

by entropyneur

2 subcomments

How about fixing whatever the hell prevents competitive private LLM vendors from appearing in Europe?

by yanis_t

1 subcomments

> A total of €13.5 million has been allocated to the project.
This is not even funny. If you want a competitive AI industry, you need to invest much more heavily in infrastructure first, building models second.