cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with Multiple CPUs Rapidly Destabilizing Over Time

Furious_Rage
Level 9

Hoping some of you fine folks can provide some suggestions on an issue I am having. I've built several hundred PCs in my 25 years in this hobby/profession. However, I can't pin down the root cause of this particular issue.

I Built myself a new computer last October of 2023. I originally installed an Intel i9-13900K CPU which ran fine for 2 months and then games started randomly crashing to desktop. After extensive troubleshooting I found that changing the "SVID Behavior" setting in the bios from "Auto" to "Worse-Case Scenario" fixed the problem at least for a few weeks and then games started crashing again. I changed the "SVID Behavior" setting to "Intel's Fail Safe" to get it stable again. I was concerned my CPU was failing so I replaced it with a new Intel i9-14900K while I RMA'd the 13900K.

While installing the i9-14900K I also installed a Thermalright CPU contact frame. The i9-14900K worked great with SVID Behavior set to "Auto" for 2 weeks and then it started crashing in games. I upped the SVID Behavior setting to "Worse-Case Scenario" which worked for about 1 week and now the system isn't stable at all, even with SVID Behavior set to ""Intel's Fail Safe" games are crashing and sometimes the computer will even freeze or BSOD now. I under clocked the CPU by applying a maximum core ratio of "54" to all p-cores and was able to lower the "SVID Behavior" down to "Typical" to stabilized it.

What's everyone's thoughts on why these CPUs are rapidly destabilizing? I'm not doing any overclocking. Could I have really got 2 faulty CPUs? Seems very unlikely. Possibly a faulty motherboard but since installing a different CPU works for some time it makes me hesitant that this is the issue. I've listed below the my hardware, bios settings and all the troubleshooting steps I've taken (some troubleshooting was done prior to determining it was a CPU stability issue).

 

Troubleshooting Steps Taken:
Updated bios to latest version (several times as new releases came out).
Restored bios settings back to defaults.
Ran memory tests (19 hours of MemTest86, 17 hours of Memtest86+, 14 hours of HCi MemTest with several instances running to load down the memory), all passed.
Purchased and installed another power supply.
Re-seated the CPU and checked for any bent pins on the motherboard (all looked good).
Re-seated all PCI-e cards (10GB NIC, M.2 card, video card).
Installed a different cpu (this fixed the issue for a couple weeks).
Under clocked CPU (applied a maximum core ratio of "54" to all p-cores)(This stabilized the CPU, but is not an ideal solution of course.)

Bios Settings:
Load optimized defaults.
Set "AI Overclock Tuner" to "XMP I" profile.
Set "ASUS Multicore Enhancement" to "Disabled -enforce all limits".
Set "Cooler Efficiency Customize" to "User Specify" with a "Cooler Score" value of "125".
Set "SR-IOV Support" to "Enabled".
Set "Legacy USB Support" to "Auto".
Set "Power On By PCI-E" to "Enabled".
Set "USB Audio" to "Disabled".
Set "Intel LAN" to "Disabled" (using add on 10GB card).
Set "CPU Fan Speed" monitoring to "Ignore" (using AIO cooler).
Set "Secure Boot OS Type" to "Windows UEFI mode".
Set "Secure Boot Mode" to "Standard".

System Hardware:
Motherboard: ASUS ROG MAXIMUS Z790 HERO, Bios Version: 2002
CPU: Intel i9-14900K with Thermalright CPU contact frame
CPU Cooler: NZXT Kraken Elite RGB 360, Model: RL-KR36E-B1
Memory: G.SKILL Trident Z5 RGB Series, Model: F5-7200J3445G16GX2-TZ5RK
GPU: EVGA GeForce RTX 3080 Ti FTW3 Ultra Gaming, Model: 12G-P5-3967-KR
Power Supply: Corsair RM1200x Shift, Model: CP-9020254-NA
Hard Drive: Crucial T700 4TB Gen5 NVMe M.2 SSD, Model: CT4000T700SSD3 (installed in ASUS Hyper M.2 Card)
Network Card: Intel Dual 10GB NIC, Model X550-T2

1,458 Views
24 REPLIES 24

Vynra
Level 12

...are you sure your ram is stable? crashing games is a big sign that ram is unstable. especially when your changing cpu voltage higher and still crashing. anything over 7000 is seriously luck on a 4 dimm board. the 14900k should be able to handle that HOWEVER you have a 4 dimm motherboard. ive heard people have issues with hero board and high memory speeds.

i know you said you ran memtest86 and passed for however long but memtest86 is only a good test for GENERAL STABILITY. id only trust that test to see going to the OS will be ok. id suggest running Kahru for 12hours with cache setting enabled and then do tm5 extreme for at least 3 cycles.

Well I was certain the memory is stable as I tried running it at a greatly reduced speed (5600 mhz if I recall correctly, since this is what the CPU officially supports) but this did not alleviate the issue. However, your comment got me thinking, all my memory testing was done with my previous CPU, the i9-13900K. I don't think the memory is the issue but I do need to re-run memory test with this new CPU just to confirm. I'll start a run this evening. I updated the troubleshooting in my original post to include HCi MemTest, I had run this as well and forgot to mention it.

if reducing the speed didnt help then it might be a motherboard issue. id say do some ram testing first then rma the board

Hard to say what the issue might be, but have you tried a fresh install of Windows to eliminate that possibility?

Z690 Hero, BIOS 3401, MEI 2406.5.5.0, ME Firmware 16.1.30.2361, 7000X Case, RM1000x PSU, i9 12900K, ASUS TUF OC 3090TI, 2 x 16GB Corsair RAM @ 5200MHz, Windows 11 Pro 23H2, Corsair H150i Elite AIO, 4x Corsair RGB fans, 3x M.2 NVME drives, 2x SATA SSDs, 2x SATA HDs.

Not a fresh install, no. I have daily automated image backups on all my PCs so when my 1st CPU started going out it did cause some corruption in Windows. I performed a barebones restore from a backup image that was taken a couple weeks prior to the CPU exhibiting any problems. I had considered doing a complete fresh install before I discovered the issue revolved around the CPU. I'm fairly confident the issue is 1 of 3 possible problems. Either I received 2 faulty CPUs (which I never even heard of happening before), the motherboard is faulty or the motherboard is spiking voltage to the CPU or something similar and killing the CPUs (this is the scenario I feel is most likely happening).

I appreciate the suggestion but I'm probably going to do like Vynra suggested. Run some additional memory tests and if it checks out then replace the motherboard. I already have a new  ASUS ROG Maximus Z790 Dark Hero motherboard sitting here still sealed in the box.  I had talked myself out of returning it for a refund once already as 2 high end motherboards and 2 CPUs for 1 build has cost me a small fortune, but I probably do need to swap out the boards. I suspect once the board is replaced I will have to RMA the 14900K as well if my suspicions are right. Anyway, thanks again. I'll update my findings when I finish testing and exchanging parts.

That seems sensible. I agree that thorough memory testing is a good idea, the symptoms you have could certainly be down to unstable memory. Just good to eliminate everything that you reasonably can. Another possibility could be an unstable PSU. Not very likely, but possible. 

Z690 Hero, BIOS 3401, MEI 2406.5.5.0, ME Firmware 16.1.30.2361, 7000X Case, RM1000x PSU, i9 12900K, ASUS TUF OC 3090TI, 2 x 16GB Corsair RAM @ 5200MHz, Windows 11 Pro 23H2, Corsair H150i Elite AIO, 4x Corsair RGB fans, 3x M.2 NVME drives, 2x SATA SSDs, 2x SATA HDs.

Silent_Scone
Super Moderator

The first thing I would do on the 14900K is remove the contact frame and retest with the stock retention bracket. 

Also, disable XMP, and report back if the system exhibits the same behaviour. Typically, CTD is indicative of cache or memory stability.

13900KS / 8000 CAS36 / ROG APEX Z790 / ROG TUF RTX 4090

woopsie
Level 9

@Furious_Rage please check this https://rog-forum.asus.com/t5/intel-700-600-series/asus-maximus-z790-hero-bios-freeze-on-f10-and-mou...

Maybe my issue with BIOS is related to your crashes. Because I also sometimes experience game crashes.

I have already replaced the i9-13900K to i9-13900KS and replaced RAM: G.Skill Ripjaws S5 64GB [2x32GB 6000MHz] to Lexar 32GB [2x16GB 7200Mhz] but crashes in games still continues even without XMP. It seems that the problem occurs more often when I set the MRC Fast Boot option to Disabled in the BIOS. So I think it might be a problem with the Asus motherboard or CPU handling the memory.

Furious_Rage
Level 9

Just an update. Ran another round of memory testing, it passed. Installed the new motherboard, problem still exists. Currently trying to get an RMA on the CPU.