by himata4113
10 subcomments
- I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.
They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.
I feel like google will surprise everyone with a model that will be an entire generation beyond SOTA at some point in time once they go from prototyping to making a model that's not a preview model anymore. All models up till now feel like they're just prototypes that were pushed to GA just so they have something to show to investors and to integrate into their suite as a proof of concept.
- Whats interesting to note, as someone who uses Gemini, ChatGPT, and Claude, is that Gemini consistently uses drastically fewer tokens than the other two. It seems like gemini is where it is because it has a much smaller thinking budget.
It's hard to reconcile this because Google likely has the most compute and at the lowest cost, so why aren't they gassing the hell out of inference compute like the other two? Maybe all the other services they provide are too heavy? Maybe they are trying to be more training heavy? I don't know, but it's interesting to see.
- > A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory.
This seems impressive. I don't know much about the space, so maybe it's not actually that great, but from my POV it looks like a competitive advantage for Google.
- "TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation" sounds impressive especially as the previous generation is so recent (2025).
Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?
- At this point, when you are doing big AI you basically have to buy it from NVidia or rent it from Google. And Google can design their chips and engine and systems in a whole-datacenter context, centralizing some aspects that are impossible for chip vendors to centralize, so I suspect that when things get really big, Google's systems will always be more cost-efficient.
(disclosure: I am long GOOG, for this and a few other reasons)
- As others have been capturing news cycle eyes, seems to me Google has been going from strength to strength quietly in the background capturing consumer market share and without much (any?) infrastructure problems considering they're so vertically integrated in AI since day one? At one point they even seemed like a lost cause, but they're like a tide.. just growing all around.
by kamranjon
2 subcomments
- It's interesting that, of the large inference providers, Google has one of the most inconvenient policies around model deprecation. They deprecate models exactly 1 year after releasing them and force you to move onto their next generation of models. I had assumed, because they are using their own silicon, that they would actually be able to offer better stability, but the opposite seems to be true. Their rate limiting is also much stricter than OpenAI for example. I wonder how much of this is related to these TPU's, vs just strange policy decisions.
by delbronski
1 subcomments
- I’ve been using Gemini with Junie (jetbrains attempt at Claude code). While Junie is nowhere near as good as Claude Code, it is way ahead of the current Google tooling. I get quite good consistent results for pretty cheap with this combo.
by amazingamazing
2 subcomments
- If ai ends up having a winner I struggle to see how it doesn’t end with Google winning because they own the entire stack, or Apple because they will have deployed the most potentially AI capable edge sites.
- This link has more on the architecture: https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...
- I've been saying it, and I'll keep saying it (as someone who has an opinion backed by very little) - I think Google is incredibly well placed for the future with LLMs.
Owning your hardware and your entire stack is huge, especially these days with so much demand. Long term, I think they end up doing very well. People clowned so hard on Google for the first two years (until Gemini 2.5 or 3) because it wasn't as good as OpenAI or Anthropic's models, but Google just looked so good for the long game.
Another benefit for them: If LLMs end up being a huge bubble that end up not paying the absurd returns the industry expects, they're not kaput. They already own so many markets that this is just an additional thing for them, where as the big AI only labs are probably fucked.
All that said: what the hell do I know? Who knows how all of this will play out. I just think Google has a great foundation underneath them that'll help them build and not topple over.
by cmptrnerd6
1 subcomments
- Which company is building the silicon for Google? Is it tsmc? What node size? I didn't see it with a quick search, sorry if it was in the post.
- Are they refreshing the coral project with these? The coral project for edge ai apps seems like it needs a refresh.
- FTA:
> One pod of TPU 8t is 121 ExaFlops; or 121,000 PetaFlops.
Meanwhile, the compute capacity of the top 10 supercomputers in the entire world is 11,487 Petaflops.[1]
I know, I know, not the same flops, yada yada, but still. Just 1 pod alone is quite a beast.
Edit:
[1] https://top500.org/lists/top500/2025/11/
- > TPU 8i pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM
Wow. Just Wow. I presume that's for each chip, and there are 1152 chips in a pod so that's 331TB HBM and 442TB SRAM per pod. Just wow.
by nickandbro
2 subcomments
- I am curious what workloads Citadel Securities is running on these TPUs? Are you telling me they need the latest TPUs for market insights?
- For how many times does this article mentions "agentic" and "agents"... Am I correct assume the hardware has nothing to do with "agents"? I assume it's just about a new generation of more efficient transformers / deep-learning layers.
by iandanforth
0 subcomment
- Anyone know if these are already powering all of Gemini services, some of them, or none yet? It's hard to tell if this will result in improvements in speed, lower costs, etc, or if those will be invisible, or have already happened.
by geremiiah
2 subcomments
- TPUs are systolic arrays right? So does that mean that Google is using a hetreogenous cluster compromising both GPUs and TPUs, for workloads that don't map well or at all on TPUs?
- At $15/GB of HBM4 the 331.8TB of HBM4 per pod is 5 million...
by NoiseBert69
0 subcomment
- That cooling system looks crazy. What an unbelievable density.
- It would be interesting to benchmark a short training / inference run on the latest of TPU vs. NVIDIA GPU per cost basis
- Interesting that t8i is both for post-training and inference.
- The real problem is that scientists doing this sort of early work more often than not want to burn hardware under their desks. Renting infrastructure in Google cloud isn't the only way...
- The pics of the cooling system is pretty good sci-fi / cyberpunk / steampunk inspo.
If the whole AI bubble spectularly collapes, at least we got a lot of cool pics of custom hardware!
- In recent discussions about Tim Apple [sic] moving on there was a discussion about whether Apple flopped on AI, which is my opinion. Of course you had the false dichotomy of doing nothing or burning money faster than the US military like OpenAI does.
IMHO that happy medium is Google. Not having to pay the NVidia tax will likely be a huge competitive advantage. And nobody builds data centers as cost-effectively as Google. It's kind of crazy to be talking ExaFLOPS and Tb/s here. From some quick Googling:
- The first MegaFLOPS CPU was in 1964
- A Cray supercomputer hit GigaFLOPS in 1988 with workstations hitting it in the 1990s. Consumer CPUs I think hit this around 1999 with the Pentium 3 at 1GHz+;
- It was the 2010s before we saw off-the-shelf TFLOPS;
- It was only last year where a single chip hit PetaFLOPS. I see the IBM Roadrunner hit this in 2008 but that was ~13,000 CPUs so...
Obviously this is near 10,000 TPUs to get to ~121 EFLOPS (FP4 admittedly) but that's still an astounding number. IT means each one is doing ~12 PFLOPS (FP4).
I saw a claim that Claude Mythos cost ~$10B to train. I personally believe Google can (or soon will be able to) do this for an order of magnitude less at least.
I would love to know the true cost/token of Claude, ChatGPT and Gemini. I think you'll find Google has a massive cost advantage here.
by dist-epoch
0 subcomment
- If you think RAM prices are going to come down, think again.
New pods use 10x as much RAM as previous generation.
by jauntywundrkind
1 subcomments
- I'm surprised the interconnect per system is so slow? 6x 200Gb feels barely competitive. Same as last year.
Trainium3 and Maia 200 are 2.5 and 2.8Tb/s vs this 1.2Tb/s. Maia is 6 stacks of HBMe3, so ratio of mem:interconnect bandwidth is really falling behind here. Notably Maia is also, like TPU, high radix.
by SecretDreams
0 subcomment
- They are missing a header to show the transition in discussion from TPU8t to 8i!
Thanks for posting otherwise.
Edit: actually, looks like the header got captured as a figure caption on accident.
- I can't help but think we will be "laughing" at this in 10 years time like we laugh at steam engines or abacus.
- [dead]
- [dead]
- yeah but can you release the sdk for the pixel 10? it was one of then only reasons which i bought this mid phone