4 weeks ago
My custom gaming PC "died" today for some unknown reason. It simply would not boot into Windows 10 nor Linux Mint. The operating systems would not boot; the computer would freeze with the spinning circle (Windows) or black screen (Linux). It would not reach the login screen. I spent hours trying many troubleshooting steps (explained later below).
The fix? Choose "water-cooled OC preset" from the ASUS BIOS "load optimized defaults" option in the "exit" tab. I'm still unsure how that fixed the issue, but the system has been running for hours without issue (albeit at faster CPU clocks).
System:
- Windows 10 64-bit.
- Ryzen 5950X CPU.
- G.Skill Trident Z NEO Series 64GB (2 x 32GB) 288-Pin SDRAM DDR4 3200 (PC4-25600) CL16-18-18-38 1.35V Dual Channel Desktop Memory Model F4-3200C16D-64GTZN.
- ASUS ROG STRIX NVIDIA GeForce RTX 3090 video card.
- Corsair RM1000x power supply.
- Noctua NH-D15 SE-AM4 air cooler.
- Samsung 970 PRO NVMe m.2 drive - 1 TB.
- 4 WD hard drives for data storage (projects/assets/video).
Before the issue began today:
I ran the computer 24/7 without issue for 4 years, with only an occasional reboot. PBO disabled for stability. Memory DOCP profile set to run DIMMs at 3200 MHz. using XMP profile. No CPU overclocks - just the normal boost (AUTO). Productivity use - coding, video editing, etc. Nothing too extreme. Recently a few reboots were needed after some storms causing power outages, and Windows 10 froze at the startup screen (the spinning circle) each time but pressing the reset button to restart the computer resolved that. That was weird but I figured Windows was recooping from the unexpected shutdowns (scanning drive/etc.). The computer ran fine for a few weeks.
The computer "died" today while compressing a large set of assets for a coding project using 7-Zip (huge 90 GB. file), then moving it from one hard drive to another. Nothing I haven't done before. But this time, Windows froze completely. No errors, no BSOD. No storms or power issues. The mouse and keyboard were unresponsive. Not even CAPS LOCK/NUM LOCK would respond. The network connection dropped. I had to turn the computer off using the PSU power switch. Once powered back on, Windows 10 began to load at the startup screen (the spinning circle), but would not ever get passed it. It would eventually try to enter Startup Repair mode but would freeze before doing so. Linux would not load either. But memory check passed. The computer would simply not load anything "heavy" outside of BIOS.
Troubleshooting:
- Ensured no cabling came loose.
- Disconnected cabling and sprayed out the case/fans with compressed air (it was very dusty).
- Disabled SATA in BIOS, enabling only the NVMe boot drive (Windows).
- Unplugged the SATA cables from the hard drives (they are just extra storage drives).
- Swapped the video card with a working spare (GTX 1080 Ti).
- Reseated the RAM sticks, and ensured they and the slots were dust-free.
- Ensured the fan speeds were at full RPM and CPU/motherboard temperatures within acceptable range (55 deg. C shown in BIOS).
- Set memory clock speed down to the default 2133 MHz., disabling Asus's DOCP profile (which uses XMP settings).
- Bumped the SOC voltage to 1.25v, up from 0.9v.
- Reset CMOS settings by removing CMOS battery for a few minutes, then reinstalling.
- Updated BIOS firmware to 4006 (the next version up from my existing 4004), and then again later to the latest 5002, using EZFlash/USB drive in BIOS.
- Running MemTest86 using a bootable USB drive, for hours. 0 errors found when running at my original settings of 3200 MHz.
- Tried booting using a Linux Mint liveboot USB drive, which the computer loads to the boot screen, but stalls when the kernel/drivers are loading (just like Windows).
- Ensured CSM was disabled for USB booting (only UEFI mode). But also tried with CSM enabled. No difference.
The last thing I tried was a weird step I normally wouldn't do (nor does it make any sense): I choose "water-cooled OC preset" from the ASUS BIOS "load optimized defaults" option in the "exit" tab in the BIOS. IT WORKED - Windows 10 is now loading and apps are functioning fine.
This "water-cooled OC preset" made some subtle tweaks to the CPU power curve, RAM settings, etc. It also lowered the RAM speed. PBO is enabled. I notice the CPU is reporting 1.3v instead of 1.4v in BIOS monitor after this change, and performance in Windows seems more responsive. I'm now hitting about 4.7 GHz. all-core. However, the CPU nearly hits 90 deg. C at full load according to Ryzen Master now, and power usage is in the red, which I want to avoid.
My original BIOS settings had PBO disabled, with conservative settings aimed at stability/longetivity. The only tweak I enabled was DOCP profile to set RAM speed to 3200 MHz. (which I paid good money to achieve).
I hope this helps anyone that believes their Ryzen CPU, ASUS motherboard, or power supply has died. One simple preset in the BIOS fixed the issue. For how long, I'm not sure. There still could be an underlying hardware issue. I will keep you all updated.
Thanks to user "guitrldy" for the tip.