Sudden death of AMD Ryzen Zen 5 CPUs

I have read on the internet that many Ryzen Zen 5 x3D and non x3D V-Cache variants are dying out of the blue.
Especially on reddit what is kind of trash, but with some value sometimes.
The death occured mainly on ASrock boards, but some other board vendors with specific boards had the same problem.
OS of choice was of course Windows 11, but I hadn't heard anything regarding linux or *BSD PCs.

I wonder whether this is the usual user error, or manufacturer error, or both ?
Did someone on FreeBSD encountered that same problem ?

My non X3D Ryzen 9900X CPU partly died last week on my X670E Xtreme AORUS Gigabyte Mainboard.
What I mean with partly is, it started with 4 RAM sticks running at 5600MHz.
After a few months it couldn't boot anymore with these 4 RAM sticks running at that frequency so, I disabled XMP and they run at 3600MHz.
For around 5 months the CPU could not run at that frequency, too, instead it could not run with 4 RAM sticks anymore, but only booted fine with 2.
It seems to me that the RAM controller on the CPU is kind of dead.
I wonder whether this could have something to do with undervolting the CPU to much.

With my new board MSI MPG X870E Carbon Wifi, I am not trying to make the same mistakes with a AMD R9 9900X3D CPU.
The only problem reported with this board is that it doesn't boot right with Wifi enabled, but I didn't hear a CPU die on it.
The only other CPU of the R9 lineage which died recently is the R9 9950X3D...
 
Just because it doesn't support overclocking anymore doesn't mean your memory controller is dead.

You can't expect to run XMP speeds with 4 DIMMs.
 
Just because it doesn't support overclocking anymore doesn't mean your memory controller is dead.

You can't expect to run XMP speeds with 4 DIMMs.
But why did it support RAM OCs before, and now it even cannot boot with all 4 RAM sticks ?
Running XMP speeds with 4 DIMMs was not possible sometime ago, but AMD fixed that, and I could use it for 2 years (one year with the 7900x and one year with the 9900x).
Then I updated the BIOS and, XMP didn't work anymore, what is very strange.
The 9900X3D runs now, quite fine with XMP on.
 
My non X3D Ryzen 9900X CPU partly died last week on my X670E Xtreme AORUS Gigabyte Mainboard.
Oh boy...
At home, I am running a 9900X on an X670E Asus board...

What I mean with partly is, it started with 4 RAM sticks running at 5600MHz.
After a few months it couldn't boot anymore with these 4 RAM sticks running at that frequency so, I disabled XMP and they run at 3600MHz.
For around 5 months the CPU could not run at that frequency, too, instead it could not run with 4 RAM sticks anymore, but only booted fine with 2.
It seems to me that the RAM controller on the CPU is kind of dead.
I wonder whether this could have something to do with undervolting the CPU to much.
Any chance that there is a bug in the DDR5 training algorithm/implementation somewhere? :D
 
Any chance that there is a bug in the DDR5 training algorithm/implementation somewhere? :D
If it is a bug, it would show up pretty late not to mention that my X670E board behaved very strangely after I have bought and inserted a 9900X CPU, but that it destroys parts was unexpected. 🤣
Maybe it is the compatiblity since X670E boards are usually designed to work with 7000X CPUs, and not 9000X.
It is just an assumption, but I believe I would still have the X670E board if I didn't change the 7900X for a 9900X CPU back then.
 
Most users neglect electrical safety. When assembling the machine and upgrading components, they must be properly grounded. Clothing must be ESD-proof. All screwdrivers must be ESD (do you follow this?). A mistake made during assembly could have consequences after some time. Ryzen is just a cheap consumer CPU. I have seen problematic 64-core EPYCs that could show that not all cores were present, but had no physical damage that is visible.
 
Well... There is an effect that can degrade semiconductors over time. Overclocking does not help, really. It makes this worse. Technology is at the limit of physics, in this case. High loads and high frequencies simply let the die age faster, that is why overclocking usually burns a fuse in the chip marking it for the rest of his time, so you can't claim warranty for your "never run out of specs" CPU.

What is it that people like so much with overclocking? Factor in your time without running, and see if you have saved any time at all.
 
Most users neglect electrical safety. When assembling the machine and upgrading components, they must be properly grounded. Clothing must be ESD-proof. All screwdrivers must be ESD (do you follow this?). A mistake made during assembly could have consequences after some time. Ryzen is just a cheap consumer CPU. I have seen problematic 64-core EPYCs that could show that not all cores were present, but had no physical damage that is visible.
Yes, I am following the rules to correctly assemble my computer.

I am not overclocking myself, but I bought RAM which has an XMP profile with 5600MHz/s.
Since the Haswell lineage from intel I never OCed my CPU again, because I didn't need it.
Yes, user error is probably the most common error.
 
For the average person overclocking makes no sense. CPUs do wear over time and it's the heat that kills them. I run my servers in my basement and two of my laptops at their designed clock rate, using powerd(8) to reduce frequency when idle.

The newest laptop, a Framework, will hit 100C under load. Reducing its frequency by 1/3 allows it to run, albeit slower than it's maximum but certainly faster than my other laptops, reducing its temperature to ~ 46C under load. My servers in my basement and my HP 840 run at the 40C to 48C range, while the old Acer will hit 52C under load.

For me, overclocking isn't worth the risk of needing to "upgrade" because the old machines stopped working. Yes, there are upgrades but on my schedule and on my budget. Emphasis on budget. And, I do a lot of high intensity work on these machines like buildworlds and poudriere.

A word of advice. The most cost effective way to improve your computer's performance is to add RAM. I maxxed out the RAM in all my computers. The ZFS hit ratios always hover around 99%. Reducing I/Os that take milliseconds pays off more than reducing instruction times by a fraction of a picosecond. I learned this tuning IBM mainframe operating systems 40 years ago. The same applies to FreeBSD, Linux, and Windows.
 
My non X3D Ryzen 9900X CPU partly died last week on my X670E Xtreme AORUS Gigabyte Mainboard.
What I mean with partly is, it started with 4 RAM sticks running at 5600MHz.
After a few months it couldn't boot anymore with these 4 RAM sticks running at that frequency so, I disabled XMP and they run at 3600MHz.
For around 5 months the CPU could not run at that frequency, too, instead it could not run with 4 RAM sticks anymore, but only booted fine with 2.
It looks like a DDR5 thing, I remember when AM4 came out DDR4 didn't play well with lots of motherboards either at that time, sticks didn't run at full speed at first or only few very specific models, some motherboards(mine included) had to wait for years to run normally meanwhile they just fallback to default which is 2600MHz.

Take a look at this, may be it can help:

These are specific to your motherboard, may be you'll find some good advises:

That being said it's sad for your dead CPU that sucks, for me lesson learned never buy again the first generation of a new socket(CPU or motherboard), 3rd 4th,etc generations are improved and less buggy just like a software :-)
 
CPU overclocking is thankfully out since the CPUs overclock themselves to near maximum out of the box now. It's no fun anymore. All hail the CABNE Opterons.

Memory overclocking is back in since it is a valid hobby and fun. It has little performance impact but it is the only game in town now. It is fueled by memory manufacturers now shipping RAM with XMP and EXPO to automatically reach the speeds that the RAM has been binned for.

Problems:
- not every CPU has a good memory controller and XMP and EXPO speeds might be unreachable even if the RAM itself could reach them
- it is still overclocking from the CPU manufacturer's standpoint. Electromigration could lead to degradation over time
 
Another respect of memory speed is ratencies.
If the "overclocking" is achieved by inserting additional wait states (delays) at some point, it could cause memory bandwidth narrower.
Memory clock speed is a quite important aspect, but not everything that affects.
 
That being said it's sad for your dead CPU that sucks, for me lesson learned never buy again the first generation of a new socket(CPU or motherboard),
I have a 1st gen AM5 mobo (Asus Prime B650M-A CSM) with a Ryzen 5 7600, and it runs fine. For me, the real sticking point was buying a quality PSU (mine is an EVGA).
 
That is still overclocking from the CPU manufacturer's view.

Does the RAM have separate XMP profiles for 2 and 4 sticks? Because surely it needs that to support 4 sticks.
As far as I know, it hasn't.
But if I start windows, and look at the RAM part in the task manager, I can see that all 4 sticks run with a frequency of 5600MHz.
It might be irrelevant in most cases, and I count PC gaming to one of these irrelevant cases, but for emulating a game console like the nintendo switch, or playstation 3, it makes a huge difference whether the RAM runs at native 3600MHz or 5600MHz.

Another respect of memory speed is ratencies.
If the "overclocking" is achieved by inserting additional wait states (delays) at some point, it could cause memory bandwidth narrower.
Memory clock speed is a quite important aspect, but not everything that affects.
I agree, in my case the CL is 38 with a clock speed of 5600MHz.
If I don't use XMP I still have the same CL.
I would consider it as pretty good for an 48 GB RAM stick.
It looks like a DDR5 thing, I remember when AM4 came out DDR4 didn't play well with lots of motherboards either at that time, sticks didn't run at full speed at first or only few very specific models, some motherboards(mine included) had to wait for years to run normally meanwhile they just fallback to default which is 2600MHz.
I didn't take that into consideration, but that will explain my weird RAM behaviour.
On my board, although the RAM sticks default speed is 4800MHz it only run with 3600MHz.
Pairing the X670E Aorus board with the 7900X CPU was no problem, the problem only started after I tried a 9900X CPU with that board.
The first strange thing was the long loading time during boots, after that it couldn't handle my RAM with XMP enabled, anymore.
The last week before I replaced it, it was at a stage where it could only load after I issued a CMOS reset.
It only started every time with a CMOS reset.
If I change something in the BIOS, or just restart without a CMOS reset, I am going to get stuck in an infinite loop.
A friend of mine confirmed after I gave him the board that it can run with 2 RAM sticks fine, but not with 4 RAM sticks anymore.

That being said it's sad for your dead CPU that sucks, for me lesson learned never buy again the first generation of a new socket(CPU or motherboard), 3rd 4th,etc generations are improved and less buggy just like a software :-):-)
I looked up that threads before giving that board and CPU to my friend.
Honestly, I am quite happy that my CPU died, because I wanted a X3D one, but the 9900X3D just came out roughly 2 weeks ago, and I wanted it last year on October, but since it did not come out, I got the 9900X, because Techpower UP said it is a game console emulation beast with actual benchmarks.
Trying out Xenoblade Chronicles 1, and 3, it just jumped between 45-60 FPS, so yeah, where is that emulation power...
I bought a 5900X for Xenoblade Chronicles, a 7900X, and a 9900X during the years, and the games just did not run at a stable 60 FPS, because the game engine is crap.
Looking up youtube videos, I saw a 5900X3D handling Xenoblade Chronicles 1 almost at 60 FPS sometimes jumping between 55 to 60 FPS.
Sad news, indeed, but two days ago I tested Xenoblade Chronicles 3 on Torzu, same crappy game engine, but due to twice the L3 Cache of the non-X3D variant, and better Cache clocks ?, I could finally achieve stable 60 FPS anywhere.

I don't think it is a new generation problem.
It is eventually because of having a 2nd gen. CPU sitting in a 1st gen socket...

CPU overclocking is thankfully out since the CPUs overclock themselves to near maximum out of the box now. It's no fun anymore. All hail the CABNE Opterons.
You mean turbo-boost ?
Yes, that is a feature I avoid, because it gives the CPU to many core voltage.
With turbo-boost on, my CPU takes around 1.34 V, without it takes around 0.9V to 1.05V.

Does this relevant?
Yes, my new X870E board does it automatically, thankfully.

Problems:
- not every CPU has a good memory controller and XMP and EXPO speeds might be unreachable even if the RAM itself could reach them
- it is still overclocking from the CPU manufacturer's standpoint. Electromigration could lead to degradation over time
With "Electromigration could lead to degradation over time" do you mean the RAM or the CPU or both ?

For the average person overclocking makes no sense. CPUs do wear over time and it's the heat that kills them.
Indeed, CPUs now have a clock rate which makes not worth anymore.
Some air-based coolers also have problems to substain the "base clock" of new CPUs, now.
Since the head can go from 60 to almost 90 degrees.
Luckily I have a very big tower with 12 fans inside.
Under load I reach 50 to 52 degrees, in idle 40 to 44 degrees, so nothing to worry, I guess.


A word of advice. The most cost effective way to improve your computer's performance is to add RAM. I maxxed out the RAM in all my computers. The ZFS hit ratios always hover around 99%. Reducing I/Os that take milliseconds pays off more than reducing instruction times by a fraction of a picosecond. I learned this tuning IBM mainframe operating systems 40 years ago. The same applies to FreeBSD, Linux, and Windows.
I agree.
That is why I use 192 GB of RAM for building anything (world, kernel, 3rd-party software through poudriere).
Games for example work a lot faster on RAM disk due to very high I/O rate transfers (97 GB/s).
 
SDK Chan
Specs says 3600 if you're using either 1R (single rank) or 2R (dual rank) RAM. XMP probably also bumps voltage among other things doesn't help. You very likely will get silent data corruption that those speeds which usually results in crashes and/or issues with booting/cold boot as memory training fails.

I only have a "dinky" Ryzen 7900 (non X) but it runs 4 DIMMs just fine @ 3600, 4000 boots but throws memory related errors after a while.
 
So overclocking CPU and memory means they run faster to hurry up and wait longer for the user or the device to respond?

I can see where cpu/memory running faster would complete memory intensive tasks quicker like video/photo editing, but for most typical desktop use? I think everything is sitting there waiting a lot of the time so "my idle states run really fast"
 
As far as I know, it hasn't.
But if I start windows, and look at the RAM part in the task manager, I can see that all 4 sticks run with a frequency of 5600MHz.
It might be irrelevant in most cases, and I count PC gaming to one of these irrelevant cases, but for emulating a game console like the nintendo switch, or playstation 3, it makes a huge difference whether the RAM runs at native 3600MHz or 5600MHz.

I'm sorry but can't work reliably. There's no way that you can run the same max overclock with 2 and 4 sticks, XMP or not. If your RAM doesn't have separate XMP for 4 modules then you need to find the max yourself.

With "Electromigration could lead to degradation over time" do you mean the RAM or the CPU or both ?

Both, specifically the memory controller in the CPU. Electromigration affects chips under too high clock or too high voltage and degrades the circuits so that they don't run stable at the same frequencies as before anymore.

That is why I use 192 GB of RAM for building anything (world, kernel, 3rd-party software through poudriere).
Games for example work a lot faster on RAM disk due to very high I/O rate transfers (97 GB/s).

As a DCS player I can feel that. Too bad the tree is too large for my Optane.
 
So overclocking CPU and memory means they run faster to hurry up and wait longer for the user or the device to respond?

I can see where cpu/memory running faster would complete memory intensive tasks quicker like video/photo editing, but for most typical desktop use? I think everything is sitting there waiting a lot of the time so "my idle states run really fast"

Overclocking is a proper E-sport. Can be a lot of fun. Of course the validation is the trick, especially when you do this sport on an overwise production computer.

I don't think photo editing is very memory bandwidth intensive. It loads the CPU pretty hard.
 
  • Like
Reactions: mer
I have a 1st gen AM5 mobo (Asus Prime B650M-A CSM) with a Ryzen 5 7600, and it runs fine. For me, the real sticking point was buying a quality PSU (mine is an EVGA).
Not only that - quality PSU can reduce coil whine on GPU`s too. Fun fact - Corsair 1200HX does not have coil wine (or it does but very inaudible ) on my Volta GPU but used to have on Seasonic 850 Platinum ( but corsair did over engineered these PSU`s back in the day ).
P.s. my gpu is with waterblock so coil wine is amplified if it has one
 
You mean turbo-boost ?
My engine runs at a turbo boost of about 1.1 - 1.3 bar. Makes 200HP continuous. But the important result of good tuning is the 580 ft-lbs of torque (for metric people, that's 785 N-m). Top speed in second gear about 4.8 mph (it has only 2 gears, slow and fast). But it can maintain that speed while ingesting a 12" diameter tree.

Disclaimer: Everything written above is true, but it is only relevant to this discussion on April 1st.
 
So overclocking CPU and memory means they run faster to hurry up and wait longer for the user or the device to respond?

I can see where cpu/memory running faster would complete memory intensive tasks quicker like video/photo editing, but for most typical desktop use? I think everything is sitting there waiting a lot of the time so "my idle states run really fast"
So overclocking CPU and memory means they run faster to hurry up and wait longer for the user or the device to respond?

I can see where cpu/memory running faster would complete memory intensive tasks quicker like video/photo editing, but for most typical desktop use? I think everything is sitting there waiting a lot of the time so "my idle states run really fast"
That highly depends on the arch / CPU design, Zen 4/5 doesn't benefit much from it at all.
 
Both, specifically the memory controller in the CPU. Electromigration affects chips under too high clock or too high voltage and degrades the circuits so that they don't run stable at the same frequencies as before anymore.
Indeed...
Electromigration happens in one way or another since electrons are moving their destined path anyway.
It is just a question whether there are moving more of them than is intended or less.
I turned off XMP, and my core voltage went down from 0.99V - 1.05V to 0.97V-0.99V resulting into a much cooler chip.
As for my CPU temps, I got a difference between almost 6 degrees.
Performance wise nothing changed.

What I have learned, OC is not worth it anymore, not even for RAM, because
1) you degrade the CPU, and eventually break it
2) the performance gain is not really worth the damage you can get within a small amount of time (8 months)
3) Overclocking voids warranty which is already 3 years now for CPUs.
So, CPUs should have a life expectancy of 3 years, and not less than one year, I think :'‑(

The base clock of my damaged CPU which cannot run with 4 RAM sticks anymore, after 8 months of use, is 4.4 GHz.
The base clock of my new CPU is also 4.4GHz.
I don't think that the base clock is the max OCed clock of it, since it can clock up to 5.5GHz.
The max voltage which degrades the CPU is around 1.35V.

3D-VCache is very sensitive so, I don't want to lose the CPU.
And yes, I agree Ryzen CPUs are cheap consumer CPUs, but Ryzen CPUs with 3D-VCache are very expensive.
I paid for my now malfunctional CPU (R9 9900X) $431.60.
For my new CPU (9900X3D) I paid around $725.18.

I hope I don't offend anyone, but the main advantage of the 3D-VCache is for me to compensate the poor performance of crappy "Japanese" made game engines.
Japanese games aren't really optimized for multi-core performance so, single-core performance is more important.
And if comparing the R9 9900X and R9 9900X3D in japanese games, then I can say that due to the 3D-VCache the R9 9900X3D delivers almost 20% more single-core performance than the R9 9900X in those games...
The more it is important while emulating games where you have every core for performing only one task.
 
Back
Top