Please help - My new Z790/13900KS Build is unusable due to random and frequent crashes.
I have just replaced my MB, CPU and RAM. The new system now consists of the following:
MB: ROG Strix Z790 Gaming-E
RAM: DDR5 2x16Gb Corsair Dominator (6200)
CPU: Intel 13900KS
GPU: RTX3090 - NOT OVERCLOCKED!
PSU: Evga 1300W G+
Storage: 2x Samsung EVO Plus 500GB M.2 / 1x Samsung Evo Plus 1Tb M.2
Cooling: Custom Loop
OS: Windows 10 22H2
The new components are the MB, RAM & CPU.
System freezes/lockups. Always locks up completely requiring a power off cycle or a MB reset. Never a blue screen or automatic system restart. Other symptoms include a split second of display and/or sound stuttering immediately before lockups.
Event Viewer Results:
Critical - The system has rebooted without cleanly shutting down first.
Kernel-Power Event ID: 41
Error - A fatal hardware error has occurred. WHEA-Logger
Crashes occur seemingly at random. Sometimes during desktop at idle. Sometimes during gaming. Sometimes during web browsing. Sometimes during in-game menu navigation.
Ironically, never so far during any kind of stress testing or benchmarking. System load has no apparent effect. The system always boots and loads the OS and shuts down correctly. Uptime ranges between 2 Minutes and 2 hours, though is typically around 20 minutes until a crash. System temps including MB, CPU, GPU, RAM always remain well within normal operating ranges.
I have tried all of the following and none have cured the problem:
default safe settings. XMPI, XMP2, XMP Tweaked (in V. 0816)
BIOS Versions - 0703 / 0813 / 0816
Disabled Onboard Wi-Fi, LAN, Bluetooth
GPU - Current GeForce version and 6 previous versions
Chipset - Current Intel ME plus 3 previous versions
Sound - Current Realtek drivers plus 3 previous versions
Tried clean booting and safe mode - no change
Intel Processor Diagnostics for 64bit - Passed
Windows Memory diagnostics - Passed
MEMtest64 from BIOS - Passed
Win SFC - Passed
Win DISM - Passed
Win CHKDSK - Passed
Samsung Magician Long Scan + Extended SMART Self Test, all drives - Passed
OS Settings tried:
Clean OS install onto new partition
Fast Boot - Disabled
Power profiles - Balanced & Performance
Hardware actions tried:
Alternative verified PSU
GPU - Verified stability in alternative PC - OK.
RAM - Excluded each of the 2 DIMMS alternately i.e. run each as a single in DIMM_A2. Also tried swapping DIMMS over between slot A2 / B2
Disconnected all other SATA hardware
Misc actions and tweaks:
Experiments with CPU under-volting, custom RAM timings, AC & DC LL tweaks, disabling Turbo boost, HT, Thermal Boost, AI Boost. None made a difference.
After all of the above, I still have no clue as to where the fault lies, other than a hunch that it's hardware related, not software. I have tried examining windows mini-dumps and LiveKernelReports/WHEA dumps using WinDbg but nothing within these makes anything clearer to me.
It's taken me a week of solid trouble shooting to get nowhere and I'm at the end of my rope, because this PC is at the core of my business and I've reached the point where I can no longer spend anymore time. In any case, I've been building PCs for 30 years and have NEVER had this kind of trouble simply getting a system run stable. Therefore, I can only assume there's a fundamental problem with either the MB, CPU or RAM.
The question is... what do I return? All of it? I'd much rather fix the problem if possible as waiting for replacement hardware is going to kill my business.
I'd be so grateful for any advice. With thanks...
Solved! Go to Solution.