cancel
Showing results for 
Search instead for 
Did you mean: 

Post Code 48 - check cpu

Evolist_ua
Level 8

Hi, I have 3990x+zenith ii extreme alpha, The first time after assembly, everything worked well, 128GB of RAM (4x32) at a frequency of 3600 with standard processor frequencies 2.9, but the voltage was reduced to 0.78V. The rest of the settings (frequencies, voltages, timings) were in automatic mode. After about two weeks, the system stopped starting at a memory frequency above 2400, and after another week the maximum frequency at startup was 2133 (yes, to reset the settings and start, you had to remove the processor from the socket), (installing a different memory, 5 BIOS versions did not give a different result, 3gpu,  2psu...),  but since the memory frequency is not important to me, I left it at 2133, and everything was fine for about 2 months (suddenly, I checked the “auto” voltages of the memory and memory controller in the processor, they were not too high). But just during operation (about 2 hours after turning on) the system simply turned off and stopped starting.

Writes "Post Code 48 - check cpu"...  (the processor gets hotter) 

Tested it on Strix TRX40-E Gaming - the result is the same, post 48( 
Tested 3970x on my extreme alpha - all ok

 

So.... my 3990x is dead?(( I couldn't find any information on post 48 anywhere else.

2,485 Views
1 ACCEPTED SOLUTION

Accepted Solutions

exintlengineer
Level 8

Hi, signed up to this forum to give you a reply...  I've had a Zenith II Extreme Alpha burn out 5 (yes, five) 3990x CPUs...  And, AMD warrantied *none* of them (that's right -- zero).  The issue with the stock BIOS settings are that they boost the SOC voltages to 1.45+ V -- where AMD specifies a max of 1.15V.  Why?  That SOC voltage feeds directly into the memory controller.  This makes testing myriad DIMMs for QVL, DOCP or EXPO easier, especially with 4+ DIMMs.  Jack the voltages up -- and everything appears to work at higher clocks.  But, the crux of the biscuit with AMD chiplet CPUs is their central IO die.  Intel had always accused AMD of "gluing" their CPUs together.  Which is why Intel went with "tiles" instead of chiplets.  The benefits of "tiles" are that it's silicon on silicon -- you can dissipate the heat and contacts are with gold bumps.  The costs of "tiles" are that you need 1000's of parallel connections between the tiles and the I/O and cache.  Not every "tile" can sit next to the I/O or cache -- so you get the "P" cores and "E" cores...  EIther the 1000's of lines are now too long and you can't run them as fast, or you have to multiplex them or run fewer lines.  The crux of the biscuit with AMD chiplet CPUs are that -- they ARE glued together.  The chiplets and central I/O die are soldered to a fiberglass PC board using 1000's of small solder blobs.  That said, the main benefits are that you can easily scale the architecture to use 4, 8 or 12 chiplets.  The costs are that you have to serialize and deserialize all of the data running between the chiplets and the I/O die -- AMD does this using a dual torus loop they call Infinity Fabric.  The crux of the biscuit with AMD chiplet CPUs are that while around 50% of this serialization and deserialization load can be distributed amongst the chiplets -- the other 50% is all concentrated in the central I/O die.  And, the central I/O die typically uses the previous FAB process.  So, if you have 5nm/7nm chiplets, you have a 10nm/14nm I/O die.  Larger transistors -- more heat.  The central I/O die also does not have the same level of sensor overheat protection that the chiplets do.  So, when you jack up the central I/O die voltages to run faster than JEDEC speeds you get a lot of heat.  This is compounded as you increase the chiplet count, which for the 3990x is 8.  When a chiplet or I/O die gets too hot, the solder blobs melt and short.  This as been an issue with later Ryzen CPUs, where you can see actual discoloration on the gold pads on the bottom of the fiberglass CPU.  It's less commonly seen with the GEN 3 Threadrippers as the 64-core 3990x was expensive and there weren't too many of them.  That, and their highest stress was typically with very short duration benchmarks -- Cinebench, Furmark, etc.  In my case, I ran very heavy PostreSQL processes utilizing 256GB of RAM and all cores.  This process would take hours of continuous running to complete.  This would invariably burn out the central I/O die using the stock Asus BIOS settings.  And, yes, I could see discolorations on the gold pads under the central I/O die on the bottom of the fiberglass CPU.  Bzzzzt.  A CPU so burnt-out will never POST again.  Moreover, depending on where it shorted, it could also take-out the motherboard as well.  At least Asus warrantied their motherboards.  I found the only way I could get a 3990x to survive was to run on JEDEC voltages and speeds -- AND NO HIGHER.  The Postgres process I was running was rebuilding/reindexing the entire US Treasury database across years.  That took a common desktop up to 29 days to complete, whereas a 3990x could complete this in just 4 to 6 hours.  If you can get one to survive...  Silicon chips don't like running over 3.2 GHz.  Run them faster and you get into the Hockey-Stick portion of the curve where it starts generating exponentially more heat.  Server gear typically limits itself to 3 GHz, both for the CPU, it's subcomponents and RAM.  Consumer gear, on the other hand, is readily overclockable.  But, with glued together CPUs and a highly loaded older FAB generation central I/O dies that's risky.  Server gear is like a pickup truck.  Reliable and it can carry heavy loads.  Consumer gear is like a Lamborgini.  Great for fast trips, just don't try and tow your boat with it...  

Recommendation:  from direct experience, run JEDEC speeds.  It doesn't matter how cool everything is, fiberglass doesn't dissipate heat well (if at all).  Something gets too hot, those solder blobs can melt and short.  And, there are 1000's of them on a Threadripper.  I had to crawl through the Asus BIOS and it's submenus making sure that the max recommended (by AMD) SOC voltages were adhered to and that all overclocking was turned off.  DIMMs were run at standard JEDEC speeds -- 2666 MHz.  Any automatic boosting was disabled.  For most all of the processes I run I could barely notice any difference in speed.  But, when I needed the workstation to tow a boat, nothing bad happens...  

Hope this helps...  

View solution in original post

6 REPLIES 6

exintlengineer
Level 8

Hi, signed up to this forum to give you a reply...  I've had a Zenith II Extreme Alpha burn out 5 (yes, five) 3990x CPUs...  And, AMD warrantied *none* of them (that's right -- zero).  The issue with the stock BIOS settings are that they boost the SOC voltages to 1.45+ V -- where AMD specifies a max of 1.15V.  Why?  That SOC voltage feeds directly into the memory controller.  This makes testing myriad DIMMs for QVL, DOCP or EXPO easier, especially with 4+ DIMMs.  Jack the voltages up -- and everything appears to work at higher clocks.  But, the crux of the biscuit with AMD chiplet CPUs is their central IO die.  Intel had always accused AMD of "gluing" their CPUs together.  Which is why Intel went with "tiles" instead of chiplets.  The benefits of "tiles" are that it's silicon on silicon -- you can dissipate the heat and contacts are with gold bumps.  The costs of "tiles" are that you need 1000's of parallel connections between the tiles and the I/O and cache.  Not every "tile" can sit next to the I/O or cache -- so you get the "P" cores and "E" cores...  EIther the 1000's of lines are now too long and you can't run them as fast, or you have to multiplex them or run fewer lines.  The crux of the biscuit with AMD chiplet CPUs are that -- they ARE glued together.  The chiplets and central I/O die are soldered to a fiberglass PC board using 1000's of small solder blobs.  That said, the main benefits are that you can easily scale the architecture to use 4, 8 or 12 chiplets.  The costs are that you have to serialize and deserialize all of the data running between the chiplets and the I/O die -- AMD does this using a dual torus loop they call Infinity Fabric.  The crux of the biscuit with AMD chiplet CPUs are that while around 50% of this serialization and deserialization load can be distributed amongst the chiplets -- the other 50% is all concentrated in the central I/O die.  And, the central I/O die typically uses the previous FAB process.  So, if you have 5nm/7nm chiplets, you have a 10nm/14nm I/O die.  Larger transistors -- more heat.  The central I/O die also does not have the same level of sensor overheat protection that the chiplets do.  So, when you jack up the central I/O die voltages to run faster than JEDEC speeds you get a lot of heat.  This is compounded as you increase the chiplet count, which for the 3990x is 8.  When a chiplet or I/O die gets too hot, the solder blobs melt and short.  This as been an issue with later Ryzen CPUs, where you can see actual discoloration on the gold pads on the bottom of the fiberglass CPU.  It's less commonly seen with the GEN 3 Threadrippers as the 64-core 3990x was expensive and there weren't too many of them.  That, and their highest stress was typically with very short duration benchmarks -- Cinebench, Furmark, etc.  In my case, I ran very heavy PostreSQL processes utilizing 256GB of RAM and all cores.  This process would take hours of continuous running to complete.  This would invariably burn out the central I/O die using the stock Asus BIOS settings.  And, yes, I could see discolorations on the gold pads under the central I/O die on the bottom of the fiberglass CPU.  Bzzzzt.  A CPU so burnt-out will never POST again.  Moreover, depending on where it shorted, it could also take-out the motherboard as well.  At least Asus warrantied their motherboards.  I found the only way I could get a 3990x to survive was to run on JEDEC voltages and speeds -- AND NO HIGHER.  The Postgres process I was running was rebuilding/reindexing the entire US Treasury database across years.  That took a common desktop up to 29 days to complete, whereas a 3990x could complete this in just 4 to 6 hours.  If you can get one to survive...  Silicon chips don't like running over 3.2 GHz.  Run them faster and you get into the Hockey-Stick portion of the curve where it starts generating exponentially more heat.  Server gear typically limits itself to 3 GHz, both for the CPU, it's subcomponents and RAM.  Consumer gear, on the other hand, is readily overclockable.  But, with glued together CPUs and a highly loaded older FAB generation central I/O dies that's risky.  Server gear is like a pickup truck.  Reliable and it can carry heavy loads.  Consumer gear is like a Lamborgini.  Great for fast trips, just don't try and tow your boat with it...  

Recommendation:  from direct experience, run JEDEC speeds.  It doesn't matter how cool everything is, fiberglass doesn't dissipate heat well (if at all).  Something gets too hot, those solder blobs can melt and short.  And, there are 1000's of them on a Threadripper.  I had to crawl through the Asus BIOS and it's submenus making sure that the max recommended (by AMD) SOC voltages were adhered to and that all overclocking was turned off.  DIMMs were run at standard JEDEC speeds -- 2666 MHz.  Any automatic boosting was disabled.  For most all of the processes I run I could barely notice any difference in speed.  But, when I needed the workstation to tow a boat, nothing bad happens...  

Hope this helps...  

Hi, thanks for your answer! 
Five dead processors is very disappointing...

(I always monitored the temperatures; the BIOS also set a limit of 75 degrees)

Yes, as far as I remember, “automatic” SOC was within 1.3-1.4.... even with memory 2133-2400.  Although now on 3970x the SOC is below 1v.

Also, your version of degradation from high voltage is confirmed by the fact that over time the maximum frequency of the RAM has decreased...  As a result, the BIOS of the Zenith II Extreme Alpha motherboard is to blame, I've looked through a lot of forums since the 3990x, and this hasn't happened on other motherboards.

I still want to return to the 3990x.... It turns out that the most important point is to install the SOC manually?

The way to insure that aggressive BIOS voltage and overclocking won't damage your 3990x is to manually set the SOC voltage.  I used 1.15V  You also have to set the DIMM speeds to 2666 MHz -- what the 3990x is rated for.  You'll have to crawl through all of the BIOS menus/submenus to check for automatic boosting of clocks and voltages and turn them off.  You can set the temperature limits lower -- but as mentioned the central I/O die doesn't have the same level of thermal protections that the chiplets do.  I use 27 (yes twenty seven) fans in my case and *nothing* gets much over 35C, even the motherboard VRMs (which typically run at 70C).  So cooling alone won't protect a Threadripper CPU from voltage and clock boosting.  Thankfully, if for some reason the BIOS "forgets" it's settings due to the BIOS battery expiring, you'll get a boot to BIOS with a message to restore your BIOS settings.  Unfortunately, this message doesn't appear if you switch to the alternate BIOS memory via a switch on the motherboard or by pressing an external button on the motherboards panel.  This will revert the BIOS settings to their auto-overclocking and over-volting settings, which can fry your CPU.  Warning -- this reversion can also happen if your OS  updates your BIOS to a newer version too (I've had this happen, and the auto-overclocking and over-volting settings returned).  I'd recommend turning off all automatic updates for drivers to prevent this.  

Using JEDEC clocks and voltages -- the 3990x runs fine and has rebuilt/reindexed very large PosgreSQL databases (around 3.5TB) multiple times.  You just have to make sure that the motherboard sticks to JEDEC settings.  There's an aggravating tendency for it to revert to auto overclocking and voltage-boosting.  I haven't found a way to permanently "burn" these settings into the BIOS defaults yet, though I'm sure there's a way.  

Hope this helps...

Robot_Sonic
Level 11

Hey guys,

The 3990x rated random access memory speed is up to 3200MHz. 

Evolist_ua, I almost had a similar problem occuring with my PC. I too was trying to overclock my RAM. So basically, even if you clear the CMOS settings on your motherboard (Zenith II Extreme), do you still get the error with the 3990x? If not, then that's great. If so, then obviously your processor is damaged -_- 

Hi -- just a quick comment -- the 3200 MHz speed might apply to systems with 1 to 4 DIMM sticks (which is most).  Mine uses 8 DIMM sticks and I remember after a lot of checking around back then that it had to run slower to conform to what AMD considers a stock speed.  The latter might be important for AMD warranty purposes (though in my experience AMD has warrantied *nothing*).  

Robot_Sonic
Level 11

Hey,

Based upon AMD's website specification's, theoretically & practically the default stock speed on the 3rd generation Threadripper's should be 3200MHz. Zenith II Extreme is designed for quad-channel. You may populate 4 or 8 DIMM's to achieve this setup. What I'm trying to say here guys, is that I too encountered a similar problem before. This problem was basically due to imcompatible random access memory modules. Through trial and error, I was forced to run an older BIOS thinking that the newer version is incorrect. However, I decided to purchase a different kit from Patriot. This kit is designed both for Intel and AMD platforms. Yesterday, I established a newer edition of BIOS (2401) for my Zenith Extreme. I am surprised of the outcome guys. My processor initialized the actual RAM speed by default to 2666MHz. I possess the first generation from Threadripper till this day. Basically, throughout these last 5 years I came to a conclusion that my modules were in conflict to the BIOS settings. 

Look guys, I'm no expert for AMD Threadripper's platforms but believe me, these motherboard's from Asus are fantastic. It's just that the IMC on these chips is extremely sensitive. All in all, I believe that you guys have incompatible memory kits installed thus the BIOS is reverting the speed back to 2133MHz. I had the same issue as well.