I've been chasing flimsy but very annoying stability problems (some, of course, due to overclocking during my younger years, when it still had a tangible payoff) enough times on systems I had built that taking this one BIG potential cause out of the equation is worth the few dozens of extra bucks I have to spend on ECC-capable gear many times over.
Trying to validate an ECC-less platform's stability is surprisingly hard, because memtest and friends just aren't very reliably detecting more subtle problems. PRIME95, y-cruncher and linpack (in increasing order of effectiveness) are better than specialzied memory testing software in my experience, but they are not perfect, either.
Most AMD CPUs (but not their APUs with potent iGPUs - there, you will have to buy the "PRO" variants) these days have full support for ECC UDIMMs. If your mainboard vendor also plays ball - annoyingly, only a minority of them enables ECC support in their firmware, so always check for that before buying! - there's not much that can prevent you from having that stability enhancement and reassuring peace of mind.
Quoth DJB (around the very start of this millenium): https://cr.yp.to/hardware/ecc.html :)
I have followed his blog for years and hold him in high respect so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.
Not to mention that you rapidly hit diminishing returns pass 200W with current gen Intel CPUs, although he mentions caring able idle power usage. Why go from 150W to 300W for a 20% performance increase?
When you do not have a bunch of components ready to swap out it is also really hard to debug these issues. Sometimes it’s something completely different like the PSU. After the last issues, I decided to buy a prebuilt (ThinkStation) with on-site service. The cooling is a bit worse, etc., but if issues come up, I don’t have to spend a lot of time debugging them.
Random other comment: when comparing CPUs, a sad observation was that even a passively cooled M4 is faster than a lot of desktop CPUs (typically single-threaded, sometimes also multi-threaded).
But I just can't bring myself to upgrade this year. I dabble in local AI, where it's clear fast memory is important, but the PC approach is just not keeping up without going to "workstation" or "server" parts that cost too much.
There are glimmers of hope with MR-DIMMs CU-DIMM, and other approaches, but really boards and CPUs need to support more memory channels. Intel has a small advantage over AMD, but it's nothing compared to the memory speed of a Mac Pro or higher. "Strix Halo" offers some hope with four memory channel support, but it's meant for notebooks so isn't really expandable (which would enable à la carte hybrid AI; fast GPUs with reasonably fast shared system RAM).
I wish I could fast forward to a better time, but it's likely fully integrated systems will dominate if the size and relatively weak performance for some tasks makes the parts industry pointless. It is a glaring deficiency in the x86 parts concept and will result in PC parts being more and more niche, exotic and inaccessible.
I use Arch, btw ;)
- cheap ULV chips like N100, N150, N300
- ultrabook ULV chips (I hope Lunar Lake is not a fluke)
- workstation chips that aren't too powerful (mainstream Core CPUs)
- inexpensive GPUs (a surprising niche, but excruciatingly small)
AMD has been dominating them in all other submarkets.Without a mainstream halo product Intel has been forced to compete on price, which is not something they can afford. They have to make a product that leapfrogs either AMD or Nvidia and successfully (and meaningfully) iterate on it. The last time they tried something like that was in 2021 with the launch of Alder Lake, but AMD overtook them with 3D V-Cache in 2022.
https://www.theregister.com/2025/08/29/amd_ryzen_twice_fails...
Sufficient cooler, with sufficient airflow is always needed.
Secondly, what BIOS settings should I be using to run safely? Is XMP/whatever the AMD equivalent is safe? If I don't run XMP then my RAM runs at way below spec (for the stick) default speeds.
Anyone know of a good guide for this stuff?
The problem is, it's a huge effort to get there. You really have to tune PBO curves for each core individually, as they can vary so much between cores.
Now the test itself is mostly automatic with tools like OCCT, but of course you have to change the settings in the BIOS between each test and you cannot use the computer during that time, so there's a huge opportunity cost. I'm talking about weeks, not days.
To cut a long story short, I sold the system and just bought a M4 Max Mac Studio now. Apple Silicon might not have the top performance of AMD or Intel, but it comes with much less headaches and opportunity cost. Which in the end probably equalizes the difference in purchase cost.
I recently hit this testing pre-release kernels on my gaming PC, a 9900X3D: https://lore.kernel.org/lkml/20250623083408.jTiJiC6_@linutro...
A pile of older Skylake machines was never able to reproduce that bug one single time in 100+ hours of running the same workload. The fast new AMD chips would almost always hit it in a few hours.
> I get the general impression that the AMD CPU has higher power consumption in all regards: the baseline is higher, the spikes are higher (peak consumption) and it spikes more often / for longer.
> Looking at my energy meter statistics, I usually ended up at about 9.x kWh per day for a two-person household, cooking with induction.
> After switching my PC from Intel to AMD, I end up at 10-11 kWh per day.
It's been the bane of desktop AMD CPUs since Zen 1. Hopefully AMD will address this in Zen 6 but I don't have too much hope.
A big surprise for me, having owned both a Ryzen gen 1 & 3 previously, was that this time my system posted without me needing to flash my BIOS or play around with various RAM configurations. Felt like magic.
Pass -fuse=mold when building.
An ideal ambient (room) temperature for running a computer is 15-25 celcius (60-77 Fahrenheit)
Source: https://www.techtarget.com/searchdatacenter/definition/ambie...
I'd say that even crashing at max temperatures is still completely unreasonable! You should be able to run at 100C or whatever the max temperature is for a week non-stop if you well damn please. If you can't, then the value has been chosen wrong by the manufacturers. If the CPU can't handle that, the clock rates should just be dialed back accordingly to maintain stability.
It's odd to hear about Core Ultra CPUs failing like that, though - I thought that they were supposed to be more power efficient than the 13th and 14th gen, all while not having their stability issues.
That said, I currently have a Ryzen 7 5800X, OCed with PBO to hit 5 GHz with negative CO offsets per core set. There's also an AIO with two fans and the side panel is off because the case I have is horrible. While gaming the temps usually don't reach past like 82C but Prime95 or anything else that's computationally intensive can make the CPU hit and flatten out at 90C. So odd to have modern desktop class CPUs still bump into thermal limits like that. That's with a pretty decent ambient temperature between 21C to 26C (summer).
> After switching my PC from Intel to AMD, I end up at 10-11 kWh per day.
It's kind of impressive to increase household electricity consumption by 10% by just switching one CPU.
I got an i5 13600KF last black friday (with a long haul to Hong Kong for about 2 weeks) from Amazon, with initially a budget motherboard that I thought would be fine, and it turns out the system would keep turning off at one point and reboot again with a huge drop in voltage (it was about 10 months later that I learned this is a brownout).
It was for my company computer, but I bought it personally, so the ownership is still mine. I then bought a new SF750 PSU at home and swapped the CPU for 13100 salvaged from a computer someone donated, so now the 13600KF would be my personal gaming rig.
I made sure it gets a platform that sustain enough power and appropriate headroom for thermals, and it was all fine until 6 months ago, it starts to BSOD all over the place, when gaming; programming; or even just resume from suspend. I have to refund two games because of this, one is accepted and the other isn't. And also turn over to cloud machine for development because BSOD in the middle of debugging is really nasty.
So I decided to say "fuck it, I'm going back to AMD". I actually still use my 3700X gig a year ago but I figured the 5 year old system is now becoming an old dog. I just can't run most modern game at even 80FPS, so I swapped to the 13600KF as an intermediate replacement until it glitched up, so I need another replacement again.
Coincidentally I bought a 7945HX engineering sample ITX motherboard originally for the intent of running Kubernetes homelab (now that I think about it, a big waste of money indeed, yikes). Then I have a eureka moment: why don't I just use that 7945HX plus the 96GB DDR5 that I bought?
So after a painful assemble-reassemble process, I'm back to AMD once again -- it was almost perfect, scoring almost exactly as a 5950X, but only at around 100W TDP for the total package, with almost double the CPU cache, plus it is not the Zen 5/Zen 5c design which complicates CPU scheduling, I have been able to solve the gaming-productivity dilemma at the same time -- and the MoDT motherboard itself is just shy of ~1800HKD in total, which is less than the 5950X CPU alone plus I have a huge TDP headroom for the 9070XT I purchased also in June -- almost complete silent platform with Noctua, too.
The original 13600KF has been redelivered back to my company with a new 800W PSU and a new case specifically bought to fit the wood aesthetic, and another AMD GPU I salvaged from my NUC (6600XT Challenger, but single fan), but this time it runs surprisingly fine -- no kernel panic or PSU brownout just yet.
After all this in a short span of 10 months, I guess I just reached my own "metastability" now -- Intel CPU for office work, AMD for gaming and workstation.
The old 3700X system is being repurposed again for running cheap Kubernetes homelab and I guess this time too it is worth the right place. I don't think I ever need to have a new purchase again for the coming few years, hopefully.
The only problem would be that I'm using an engineering sample rather than the normal version of 7945HX -- the normal one can reach up to 5.4GHz boost but mine only got 5.2GHz boost, at a cost of 600HKD difference, I would say it is not worth it to upgrade to the normal version, no?