03-26-2023 07:39 AM
Please help - My new Z790/13900KS Build is unusable due to random and frequent crashes.
I have just replaced my MB, CPU and RAM. The new system now consists of the following:
MB: ROG Strix Z790 Gaming-E
RAM: DDR5 2x16Gb Corsair Dominator (6200)
CPU: Intel 13900KS
GPU: RTX3090 - NOT OVERCLOCKED!
PSU: Evga 1300W G+
Storage: 2x Samsung EVO Plus 500GB M.2 / 1x Samsung Evo Plus 1Tb M.2
Cooling: Custom Loop
OS: Windows 10 22H2
The new components are the MB, RAM & CPU.
Symptoms:
System freezes/lockups. Always locks up completely requiring a power off cycle or a MB reset. Never a blue screen or automatic system restart. Other symptoms include a split second of display and/or sound stuttering immediately before lockups.
Event Viewer Results:
Critical - The system has rebooted without cleanly shutting down first.
Kernel-Power Event ID: 41
Error - A fatal hardware error has occurred. WHEA-Logger
Scenarios:
Crashes occur seemingly at random. Sometimes during desktop at idle. Sometimes during gaming. Sometimes during web browsing. Sometimes during in-game menu navigation.
Ironically, never so far during any kind of stress testing or benchmarking. System load has no apparent effect. The system always boots and loads the OS and shuts down correctly. Uptime ranges between 2 Minutes and 2 hours, though is typically around 20 minutes until a crash. System temps including MB, CPU, GPU, RAM always remain well within normal operating ranges.
Troubleshooting
I have tried all of the following and none have cured the problem:
Bios:
default safe settings. XMPI, XMP2, XMP Tweaked (in V. 0816)
BIOS Versions - 0703 / 0813 / 0816
Cleared CMOS
Disabled Onboard Wi-Fi, LAN, Bluetooth
Drivers:
GPU - Current GeForce version and 6 previous versions
Chipset - Current Intel ME plus 3 previous versions
Sound - Current Realtek drivers plus 3 previous versions
Tried clean booting and safe mode - no change
Diagnostics:
Intel Processor Diagnostics for 64bit - Passed
Windows Memory diagnostics - Passed
MEMtest64 from BIOS - Passed
Win SFC - Passed
Win DISM - Passed
Win CHKDSK - Passed
Samsung Magician Long Scan + Extended SMART Self Test, all drives - Passed
OS Settings tried:
Clean OS install onto new partition
Fast Boot - Disabled
Power profiles - Balanced & Performance
Hardware actions tried:
Alternative verified PSU
GPU - Verified stability in alternative PC - OK.
RAM - Excluded each of the 2 DIMMS alternately i.e. run each as a single in DIMM_A2. Also tried swapping DIMMS over between slot A2 / B2
Reseated CPU
Reseated DIMMS
Reseated GPU
Disconnected all other SATA hardware
Misc actions and tweaks:
Experiments with CPU under-volting, custom RAM timings, AC & DC LL tweaks, disabling Turbo boost, HT, Thermal Boost, AI Boost. None made a difference.
After all of the above, I still have no clue as to where the fault lies, other than a hunch that it's hardware related, not software. I have tried examining windows mini-dumps and LiveKernelReports/WHEA dumps using WinDbg but nothing within these makes anything clearer to me.
It's taken me a week of solid trouble shooting to get nowhere and I'm at the end of my rope, because this PC is at the core of my business and I've reached the point where I can no longer spend anymore time. In any case, I've been building PCs for 30 years and have NEVER had this kind of trouble simply getting a system run stable. Therefore, I can only assume there's a fundamental problem with either the MB, CPU or RAM.
The question is... what do I return? All of it? I'd much rather fix the problem if possible as waiting for replacement hardware is going to kill my business.
I'd be so grateful for any advice. With thanks...
Luke
Solved! Go to Solution.
03-27-2023 03:09 AM
Thanks @JohnAb @Murph_9000 @Shinchan0125
It seems that the riser cable was the problem and having switched the MB to run at Gen 3, I appear to have a completely stable and very fast system 🙂
I'm very grateful to all of you for your input.
Cheers!
03-26-2023 07:48 AM
are you using any vertical GPU mounting? If so, try without it.
03-26-2023 07:53 AM
Tried that using a different air cooled card because I'm running a custom loop, so not easy to remount the existing GPU.
03-26-2023 08:05 AM
Based on what you have tried, I'm suspecting an attached device - probably USB. I've had similar issues in the past and it was a faulty USB cable. Not an easy thing to find and it took weeks to work it out. Try unplugging everything, even USB on the motherboard (like a cooler controller, fan controller etc) and try a different keyboard and mouse too. Somebody here had a faulty Corsair Commander fan/RGB controller that was also causing similar issues. Hope that idea leads you somewhere positive...
03-26-2023 11:25 AM
I've just had a thought. I am running a PCIe 3 riser cable and my MB slot is set to PCIe 4. Might that cause a problem? Meaning the MB slot is trying to push gen 4 over a gen 3 cable.
03-26-2023 12:36 PM
Could be. Riser cables can be problematic anyway, especially cheaper ones. Worth trying a good quality Gen 4 one, good thought lukehall.
03-26-2023 12:45 PM
@lukehall wrote:I've just had a thought. I am running a PCIe 3 riser cable and my MB slot is set to PCIe 4. Might that cause a problem? Meaning the MB slot is trying to push gen 4 over a gen 3 cable.
This is extremely likely to be a problem, unless you happened to get a PCIe 3.0 cable that was actually up to PCIe 4.0 specs. The motherboard and graphics card have no way to detect the riser cable rating, so you'll almost certainly have been running at 4.0 speed, pushing double the rated signal frequency over the cable. 3.0 riser cables in 4.0 systems are a known cause of instability and general problems, unless you force the slot to run in 3.0 mode.
I use a LINKUP PCIe 4.0 riser cable which seems like it's good quality and works well. I picked that brand because EK include one of their cables in a product and I figured EK would have picked a decent one. CableMod do them as well now, another I'd trust for it. Avoid the random unknown brands for it, it's one of those things which needs quality due to the signal rates.
03-26-2023 03:35 PM
Thank you very much. This may actually be a "eureka" moment. I've set the MB PCIe to run as Gen 3 and so far, the system has been stable 😊 🤞
So... my next question regards a Gen 4 riser... Currently running at Gen 3 and 8x GpuZ indicates that my interface bus is running at 8-10% utilisation only, with the GPU at 95% utilisation. Does this mean that only 10% of the available bandwidth is being used, and therefore, is there any point in me buying a Gen 4 cable? Also, regarding the speed - The MB and GPU are capable of running Gen 4 at 16x.
Will I see a worthwhile improvement in FPS or anything else with Gen 4? If not, I've got to ask what the point of it is?!
03-26-2023 12:26 PM
You tell 'em John 😃
03-26-2023 04:41 PM
People do say that video cards don't fully utilise x16 anyway. I'm not sure of the answer for sure, but I think you are most likely correct. I've seen comments that x8 is enough anyway. Not sure how that relates to 8x Gen 3, but based on the figures you just gave, I'd be thinking the same. Anyway, it does seem that your cable was the issue, so why not see how it goes? You are stable now, so that's great news. You can enjoy your PC, do some more research and maybe upgrade the cable later if you want to? Happy for you 😁