FRESH

Hacker News

Home

Granite 4.1: IBM's 8B Model Matching 32B MoE

313 points by steveharing1

by 2ndorderthought

5 subcomments

I test drove it yesterday. It's pretty impressive at 8b. Runs on commodity hardware quickly.
Qwen3.6 35b a3b is still my local champion but I may use this for auto complete and small tasks. Granite has recent training data which is nice. If the other small models got fine tuned on recent data I don't know if I would use this at all, but that alone makes it pretty decent.
The 4b they released was not good for my needs but could probably handle tool calls or something

by m3at

0 subcomment

https://research.ibm.com/blog/granite-4-1-ai-foundation-mode...
Original article on IBM research
Hugging face weights: https://huggingface.co/collections/ibm-granite/granite-41-la...

by cbg0

2 subcomments

The real "sleeper" might be https://huggingface.co/ibm-granite/granite-vision-4.1-4b if the benchmarks hold up for such a small model against frontier models for table & semantic k:v extraction.

by smj-edison

10 subcomments

On the topic of local models, is there a good equivalent to something like Claude's chat interface? I've recently started transitioning to open models after getting fed up with Claude's usage limits (I'm not in a position to drop $200/month), and for coding tasks Kimi 2.6 has been about the same as Sonnet in my experience. The only thing I've found myself missing is a nice interface to ask it questions and have it help me with my math assignments.

by Havoc

2 subcomments

Interesting to see a pivot away from MoE by both IBM and mistral while the larger classes of SOTA of models all seem to be sticking to it.
Quick vibe check of it- 8B @ Q6 - seems promising. Bit of a clinical tone, but can see that being useful for data processing and similar. You don't really want a LLM that spams you with emojis sometimes...

by 0xbadcafebee

11 subcomments

People complain a lot about LLM-written articles, but the human comments here on HN are far worse. Mostly a bunch of people extremely proud of themselves for not reading an LLM-written article, and then a bunch of people who take it at face value and make the model seem almost useful, and one comment that actually looked at other benchmarks. Good 'ol humanity, good at.. being emotional... and not doing analysis.....
The article makes some good points about model design (how different size models within a family can get similar results, how to filter out hallucination, math result reinforcement), so that's worth understanding. It's analyzing a paper, which only discussed 3 sizes of the same model family. But what the article doesn't say is, compared to other model families, Granite 4.1 8B sucks. The only benchmark it does well at compared to other models is non-hallucination and instruction following. Qwen 3.5 4B (among other models) easily outclass it on every other metric.
This article teaches a valuable lesson about reading articles in general. You can take useful information away from them (yes, despite being written by LLM). But you should also use critical thinking skills and be proactive to see if the article missed anything you might find relevant.

by 100ms

3 subcomments

> Full stop.
Why people don't edit out obvious sloppification and expect to still have readers left

by nielsbot

0 subcomment

Very much an aside, but I'm struck by IBM's consistent iconic design language. For me it harkens all the way back to the futuristic design in 2001: A Space Odyssey from 1968. But you can also see it in their old mainframe hardware designs and other places.

by simonw

1 subcomments

The Granite 4.1 3B model is only 2GB from Unsloth: https://huggingface.co/unsloth/granite-4.1-3b-GGUF
I ran it in LM Studio and got a pleasingly abstract pelican on a bicycle (genuinely not bad for a tiny 3B model - it can at least output valid SVG): https://gist.github.com/simonw/5f2df6093885a04c9573cf5756d34...

by pjmalandrino

0 subcomment

Very impressive series of SLM by IBM here.
I have been using it with their Chunkless RAG concept and it is fitting very well! (for curious https://github.com/scub-france/Docling-Studio)
I convinced that SLM are a real parto of solution for true integrated AI in process...

by dash2

4 subcomments

Nah, I ain't reading that. If they can't be bothered to get a human to write it, it can't be that important. I'm glad for them though. Or sorry that happened.

by cubefox

1 subcomments

It's strange that they don't include reasoning training (RLVR). Their justification doesn't sound convincing:
> While reasoning models have grown in popularity in recent years, their abilities aren’t always the most efficient way to get a result. In enterprise settings, token costs and speed are often as important as performance. That is why turning to less expensive, non-reasoning models with similar benchmark performance for select tasks like instruction following and tool calling makes sense for enterprise users.
I guess they currently don't have the ability to do proper RLVR.

by tosh

0 subcomment

IBM announcement: https://research.ibm.com/blog/granite-4-1-ai-foundation-mode...

by dimitrismrtzs

0 subcomment

The 8B class closing the gap with 32B is the real story of 2026 for anyone running models locally. I've been using smaller models for agent tool-use and the progress this year is real.
The gap that still matters most isn't intelligence — it's consistency on structured output. When you chain 5+ tool calls in sequence, even a small per-call reliability difference compounds fast. Would love to see Granite 4.1 benchmarked specifically on multi-step function calling rather than just general benchmarks.

by agunapal

2 subcomments

If you really think about why MoE came into existence, its to save significant cost during training, I don't think there was any concrete evidence of performance gains for comparable MoE vs dense models. Over the years, I believe all the new techniques being employed in post training have made the models better.

by woadwarrior01

1 subcomments

The most salient thing about these models is that they're non-reasoning models. This makes then very token efficient and particularly well suited for local inference where decoding is usually slower than with datacenter GPUs.
Link to HF collection: https://huggingface.co/collections/ibm-granite/granite-41-la...

by mdp2021

0 subcomment

I read that IBM pioneered the concept of "shifting through "mid-training" from "guessing the next token" to "guessing the next logical step"". I am wondering how far is the research from "enhancing apparent reasoning" to "achieving solid, reliable reasoning".
If techniques existed to shift from "guess the next highly probable" token to "guess the best next logical step", as some interpreted said research, should not that be the foremost objective?

by dissahc

0 subcomment

qwen3.5 9b outperforms granite 4.1 30b by a huge amount (32 vs 15 on artificialanalysis benchmark)... i have no idea what made the writer of this article say so many demonstrably incorrect things

by RandyOrion

0 subcomment

Although the performance claim of 8b dense matching 32b moe is somewhat questionable, thank you granite team for releasing small dense LLMs.

by mdp2021

2 subcomments

Wish they also released an embedding model, in the line of their previous: compact (while good)...

by latentframe

0 subcomment

The limit is changing from scaling parameters to scaling datas quality however compute is still the big constraint

by RugnirViking

1 subcomments

sounds interesting. Here's hoping they release a 32B model, thats a pretty good sweet spot for feasibility of home setups.
edit: I just realised they do actually have a 30b release alongside this. Haven't tried it yet.

by sexylinux

0 subcomment

Is this a model that will create reliable output or will it also produce errors?

by SwellJoe

0 subcomment

I wish AI slop articles were somehow automatically flagged and deaded. They're all flowery verbose piles of crap. Yeah, the model is interesting, but the article is trash. I can't believe real humans are willing to sign their name to this stuff.

by theblazehen

0 subcomment

> models are judged by GPT-4
An interesting choice

by peter_d_sherman

0 subcomment

>"Stage two was RLHF training on general chat prompts using a reward model to improve helpfulness. This worked. AlpacaEval scores jumped around 18.9 points on average compared to the fine-tuned checkpoints.
Then something broke. The RLHF stage, while improving chat quality, caused math benchmark scores to drop. GSM8K and DeepMind-Math both regressed."
Observation: Math (which when fully decomposed, results in Logic) is at the core of how computers (traditional/older, non-LLM, programming languages work. If an LLM gets Math training wrong at any stage for any reason, then, in my opinion, that should be viewed as something that needs to be fixed at a lower level, not a higher one; not a later training level...
I think it would be interesting exercise to train an LLM that only deals in simple Math, simple English, and only the ability to compute simple equations (+,-,x,/)... like, what's the absolute minimum in terms of text and layers necessary to train a model like that?
I think some interesting understandings could be potentially be had by experimentation like that...
I myself would love a pure (simplest, smallest possible)
Text-to-Math only LLM (TTMLLM, TTMSLM?)
, along with all of the necessary corpuses (which would ideally be as small as possible) and instructions necessary to train such an LLM...

by nickpsecurity

0 subcomment

On changing the training mix, H20 did that with Danube in 2024:
https://arxiv.org/pdf/2401.16818
With those results, I would've already done that in any models I got to train. There's also the principle that the LLM's are often better at what they saw last in their training set. That also justifies putting more logic, code, and math in at the end for an analytical or coding model. So, a few precedents for that technique already.

by rasz

0 subcomment

me: "Bits 5-4 indicate the video mode", how do you derive VIDEO_MASK_00b = 0x03C0 from this?
Granite 4.1 8B: "I apologize for the confusion earlier. Let's clarify the derivation of the VIDEO__MASK_00b based on the description "Bits 5-6 indicate the video mode" from the equipment_list section.
Given that bits 5 and 6 are used for video mode, we need to create a mask that isolates these two bits:
Bit 5 is the least significant bit of the group (2^5 = 32 decimal, 0b00100000 in 8-bit form). Bit 6 is the next bit up (2^6 = 64 decimal, 0b01000000 in 8-bit form). Combining these bits: ...
Therefore, the correct mask for VIDEO_MASK_00b should be 0x00C0"
Errors on top of errors when converting description into binary numbers. Its hopeless for basic task like parsing/generating headers :(

by robotmaxtron

1 subcomments

"open source"
show me.

by tokenhub_dev

0 subcomment

[flagged]

by samagragune

0 subcomment

[dead]

by whalesalad

0 subcomment

[flagged]