FRESH

Hacker News

Gemini 3 Pro Model Card [pdf]

276 points by virgildotcodes

by scrlk

18 subcomments

Benchmarks from page 4 of the model card:

    | Benchmark             | 3 Pro     | 2.5 Pro | Sonnet 4.5 | GPT-5.1   |
    |-----------------------|-----------|---------|------------|-----------|
    | Humanity's Last Exam  | 37.5%     | 21.6%   | 13.7%      | 26.5%     |
    | ARC-AGI-2             | 31.1%     | 4.9%    | 13.6%      | 17.6%     |
    | GPQA Diamond          | 91.9%     | 86.4%   | 83.4%      | 88.1%     |
    | AIME 2025             |           |         |            |           |
    |   (no tools)          | 95.0%     | 88.0%   | 87.0%      | 94.0%     |
    |   (code execution)    | 100%      | -       | 100%       | -         |
    | MathArena Apex        | 23.4%     | 0.5%    | 1.6%       | 1.0%      |
    | MMMU-Pro              | 81.0%     | 68.0%   | 68.0%      | 80.8%     |
    | ScreenSpot-Pro        | 72.7%     | 11.4%   | 36.2%      | 3.5%      |
    | CharXiv Reasoning     | 81.4%     | 69.6%   | 68.5%      | 69.5%     |
    | OmniDocBench 1.5      | 0.115     | 0.145   | 0.145      | 0.147     |
    | Video-MMMU            | 87.6%     | 83.6%   | 77.8%      | 80.4%     |
    | LiveCodeBench Pro     | 2,439     | 1,775   | 1,418      | 2,243     |
    | Terminal-Bench 2.0    | 54.2%     | 32.6%   | 42.8%      | 47.6%     |
    | SWE-Bench Verified    | 76.2%     | 59.6%   | 77.2%      | 76.3%     |
    | t2-bench              | 85.4%     | 54.9%   | 84.7%      | 80.2%     |
    | Vending-Bench 2       | $5,478.16 | $573.64 | $3,838.74  | $1,473.43 |
    | FACTS Benchmark Suite | 70.5%     | 63.4%   | 50.4%      | 50.8%     |
    | SimpleQA Verified     | 72.1%     | 54.5%   | 29.3%      | 34.9%     |
    | MMLU                  | 91.8%     | 89.5%   | 89.1%      | 91.0%     |
    | Global PIQA           | 93.4%     | 91.5%   | 90.1%      | 90.9%     |
    | MRCR v2 (8-needle)    |           |         |            |           |
    |   (128k avg)          | 77.0%     | 58.0%   | 47.1%      | 61.6%     |
    |   (1M pointwise)      | 26.3%     | 16.4%   | n/s        | n/s       |

n/s = not supported

EDIT: formatting, hopefully a bit more mobile friendly

by mynti

13 subcomments

It is interesting that the Gemini 3 beats every other model on these benchmarks, mostly by a wide margin, but not on SWE Bench. Sonnet is still king here and all three look to be basically on the same level. Kind of wild to see them hit such a wall when it comes to agentic coding

by Taek

1 subcomments

One benchmark I would really like to see: instruction adherence.
For example, the frontier models of early-to-mid 2024 could reliably follow what seemed to be 20-30 instructions. As you gave more instructions than that in your prompt, the LLMs started missing some and your outputs became inconsistent and difficult to control.
The latest set of models (2.5 Pro, GPT-5, etc) seem to top out somewhere in the 100 range? They are clearly much better at following a laundry list of instructions, but they also clearly have a limit and once your prompt is too large and too specific you lose coherence again.
If I had to guess, Gemini 3 Pro has once again pushed the bar, and maybe we're up near 250 (haven't used it, I'm just blindly projecting / hoping). And that's a huge deal! I actually think it would be more helpful to have a model that could consistently follow 1000 custom instructions than it would be to have a model that had 20 more IQ points.
I have to imagine you could make some fairly objective benchmarks around this idea, and it would be very helpful from an engineering perspective to see how each model stacked up against the others in this regard.

by transcriptase

8 subcomments

There needs to be a sycophancy benchmark in these comparisons. More baseless praise and false agreement = lower score.

by embedding-shape

6 subcomments

Curiously, this website seems to be blocked in Spain for whatever reason, and the website's certificate is served by `allot.com/emailAddress=info@allot.com` which obviously fails...
Anyone happen to know why? Is this website by any change sharing information on safe medical abortions or women's rights, something which has gotten websites blocked here before?

by lxdlam

3 subcomments

What does the "Google Antigravity" mean? The link is http://antigravity.google/docs, seemingly a new product but now routing to the Google main page.

by meetpateltech

0 subcomment

it was accidentally pushed a little early, and now it has been taken down.
here’s the archived pdf: https://web.archive.org/web/20251118111103/https://storage.g...

by bemmu

1 subcomments

I saw this on Reddit earlier today. Over there the source of this file was given as: https://web.archive.org/web/20251118111103/https://storage.g...
The bucket name "deepmind-media" has been used in the past on the deepmind official site, so it seems legit.

by laborcontract

4 subcomments

It's hilarious that the release of Gemini 3 is getting eclipsed by this cloudflare outage.

by patates

3 subcomments

It says it's been trained from scratch. I wonder if it will have the same undescribable magic that makes me spend an hour every day with 2.5. I really love the results I can get with 2.5 pro. Google eventually limiting aistudio will be a sad day.
Also I really hoped for a 2M+ context. I'm living on the context edge even with 1M.

by ethmarks

2 subcomments

> TPUs are specifically designed to handle the massive computations involved in training LLMs and can speed up training considerably compared to CPUs.
That seems like a low bar. Who's training frontier LLMs on CPUs? Surely they meant to compare TPUs to GPUs. If "this is faster than a CPU for massively parallel AI training" is the best you can say about it, that's not very impressive.

by denysvitali

1 subcomments

Title of the document is "[Gemini 3 Pro] External Model Card - November 18, 2025 - v2", in case you needed further confirmation that the model will be released today.
Also interesting to know that Google Antigravity (antigravity.google / https://github.com/Google-Antigravity ?) leaked. I remember seeing this subdomain recently. Probably Gemini 3 related as well.
Org was created on 2025-11-04T19:28:13Z (https://api.github.com/orgs/Google-Antigravity)

by Bobaso

1 subcomments

Interesting to see on page 2 the reference to ML pathways [1]. Looks like a multi layer mixture of experts. Is this common ?
[1] https://blog.google/technology/ai/introducing-pathways-next-...

by fraboniface

0 subcomment

> Developments to the model architecture contribute to the significantly improved performance from previous model families.
I wonder how significant this is. DeepMind was always more research-oriented that OpenAI, which mostly scaled things up. They may have come up with a significantly better architecture (Transformer MoE still leaves a lot of room).

by mohsen1

2 subcomments

     This model is not a modification or a fine-tune of a prior model

Is that common to mention that? Feels like they built something from scratch

by Topfi

2 subcomments

Additional context from AI Studio including pricing:
Our most intelligent model with SOTA reasoning and multimodal understanding, and powerful agentic and vibe coding capabilities
<=200K tokens • Input: $2,00 / Output: $12,00
> 200K tokens • Input: $4,00 / Output: $18,00
Knowledge cut off: Jan. 2025

by aliljet

1 subcomments

What's wild here is that among every single score they've absolutely killed, somehow, Anthropic and Claude Sonnet 4.5 have won a single victory in the fight: SWE Bench Verified and only by a singular point.
I already enjoy Gemini 2.5 pro for planning and if Gemini 3 is priced similarly, I'll be incredibly happy to ditch the painfully pricey Claude max subscription. To be fair, I've already got an extremely sour taste in my mouth from the last Anthropic bait and switch on pricing and usage, so happy to see Google take the crown here.

0 subcomment

by gardnr

0 subcomment

Gemini 3 Deep Think gets 45.1% on ARG-AGI-2
Gemini 3 Pro gets 31.1% on ARG-AGI-2
https://arcprize.org/leaderboard

by ks2048

0 subcomment

Why is this linking to a random site? Here is a link hosted by Google:
https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...

by Palmik

0 subcomment

Archive link: https://web.archive.org/web/20251118111103/https://storage.g...

by TheAceOfHearts

3 subcomments

They scored a 31.1% on ARC AGI 2 which puts them in first place.
Also notable which models they include for comparison: Gemini 2.5 Pro, Claude Sonnet 4.5, and GPT-5.1. That seems like a minor snub against Grok 4 / Grok 4.1.

by butlike

0 subcomment

It's over. I just don't care anymore. I don't care what a pro model card is. I don't care what a humanity's last exam is. I don't care if the response makes me feel good about the prompt I made. I don't care if it's sentient. I don't care if it's secretly sentient. I don't care if it's just a machine. I don't care if the gov't has appropriated a secret model. I don't care if this is the precursor to AGI, ASI, AGGI, AGGSISGIGIG....I just. Don't. care.
And I really don't think I'm alone in this.

0 subcomment

by koakuma-chan

1 subcomments

> Gemini 3 Pro was trained using Google’s Tensor Processing Units (TPUs)
NVDA is down 3.26%

by lifthrasiir

0 subcomment

For the veracity of the link itself: https://storage.googleapis.com/deepmind-media/* has been used by DeepMind itself (e.g. "View tech report" in https://deepmind.google/models/gemini/) so it is a genuine leak.

by msp26

0 subcomment

Is flash/flash lite releasing alongside pro? Those two tiers have been incredible for the price since 2.0, absolute workhorses. Can't wait for 3.0.

by oalessandr

3 subcomments

Trying to open this link from Italy leads to a CSAM warning

by robert-zaremba

0 subcomment

The strategic move to use TPU rather than Nvidia is paying well for Google. They are able to better utilize their existing large infrastructure, but also specialize the processes and pipelines for their own framework that they use to create and train models.
I think a specialized hardware for training models is the next big wave in China.

0 subcomment

by __jl__

0 subcomment

API pricing is up to $2/M for input and $12/M for output
For comparison: Gemini 2.5 Pro was $1.25/M for input and $10/M for output Gemini 1.5 Pro was $1.25/M for input and $5/M for output

by charcircuit

0 subcomment

>TPUs are specifically designed to handle the massive computations involved in training LLMs and can speed up training considerably compared to CPUs
Who is training LLMs with CPUs?

by amelius

2 subcomments

These model cards tell me nothing. I want to know the exact data a model was trained on. Otherwise, how can I safely use it for generating texts that I show to children? Etc.etc.

by eric15342335

0 subcomment

Update: it is available at https://aistudio.google.com now!

by bretpiatt

1 subcomments

Page 5, "The knowledge cutoff date for Gemini 3 Pro was January 2025."
Still taking nearly a year to train and run post training safety and stability tuning.
With 10x the infrastructure they could iterate much faster, I don't see AI infrastructure as a bubble, it is still a bottleneck on pace of innovation at today's active deployment level.

by nilayj

1 subcomments

Curious to see the API pricing. SOTA performance across tasks at a price cheaper than GPT 5 / Claude would make mostly everyone switch to Gemini.

0 subcomment

by surrTurr

0 subcomment

gone now;
wayback machine still has it: https://web.archive.org/web/20251118111103/https://storage.g...

by surrTurr

0 subcomment

good benchmark stats except for coding where it looks similar to other SOTA models

by aurareturn

0 subcomment

Benchmark suggests it is a resounding win for Gemini 3 Pro as the top model.

by fcanesin

0 subcomment

Great stuff, now if could please do gemini-2.5-pro-code that would be great

by wiz21c

0 subcomment

SWE-Bench is disappointing not because it is lower than Claude, but because improving on all other domains of knowledge didn't help. So does this mean that this is actually a MoE model in the sense that one expert doesn't talk to the other ?

by rvz

1 subcomments

> The training dataset also includes: publicly available datasets that are readily downloadable; data obtained by crawlers; licensed data obtained via commercial licensing agreements; user data (i.e., data collected from users of Google products and services to train AI models, along with user interactions with the model) in accordance with Google’s relevant terms of service, privacy policy, service-specific policies, and pursuant to user controls, where appropriate; other datasets that Google acquires or generates in the course of its business operations, or directly from its workforce; and AI-generated synthetic data.
Well don't complain when you are using Gmail and your emails are being trained to develop Gemini.

by DeathArrow

0 subcomment

I hope cheaper Chinese open weights models as good as Gemini will come soon. Gemini, Claude, GPT are kind of expensive if you use AI a lot.

by 827a

0 subcomment

What is Google Antigravity?

by Traubenfuchs

4 subcomments

So does google actually have a claude console alternative currently?

0 subcomment

by danielcampos93

0 subcomment

mums the word on Flash?

by catigula

3 subcomments

I know this is a little controversial but the lack of performance on SWE-bench is hugely disappointing I think economically. These models don’t have any viable path to profitability if they can’t take engineering jobs.

by VeritySage07

0 subcomment

[dead]

by omidsa1

0 subcomment

TL;DR: expected results, not underwhelming.So far scaling laws hold.

by Barry-Perkins

0 subcomment

[flagged]

by Joshua-Peter

0 subcomment

[flagged]

by jll29

1 subcomments

[flagged]

by margorczynski

8 subcomments

If these numbers are true then OpenAI is probably done, Anthropic too. Still, it's hard to see an effective monetization method for this tech and it clearly is eating Google's main pie which is search.

by jll29

0 subcomment

Hopefully this model does not generate fake news...
https://www.google.com/search?q=gemini+u.s.+senator+rape+all...