14th Gen on non refresh Z790

I've been handling a problem on behalf of a 3rd party. The problem HAS been solved so I'm not looking for support, merely just to share the information somewhere in hopes this can help others facing similar issues.

Sorry, this is a long post.

TL;DR - Instability on stock settings on multiple Z790 Heros (Non-refresh) using different 14900K's, RAM and Storage devices. Solved simply by swapping the board to a Z790 Dark Hero (Refresh)

As a disclaimer, I had very limited time investigating this issue due to the nature of the getting a PC to a customer within a reasonable time frame so some information may be incorrect or incomplete. I'm open to being educated if this is the case.

Story: (Skippable but for context)
Customer had a PC running Z790 Hero, 14900K, 6200MHz Dominator (Hynix ICs), 2x 2tb SN770, 4090 Strix OC, Latest Win 11. Issues were games were crashing within minutes. Easily replicated simply by loading Helldivers 2. Other crashes and instabilities were observed consistently such as through 3D Mark, Blender, Karhu, Real Bench and CB R23. These issues were happening entirely at stock settings. The first time I got my hands on this build, the motherboard needed replacing entirely due to the 4090 wrecking the PCIe port (F) - Shout out to internal packaging which was sadly not present. Once the board was replaced, the issues weren't being replicated using synthetic benchmarking. 14th Gen was fairly new at the time initially so I put it down to "early adoption tax" or whatever you want to call it. System arrived back to customer and the same issues were present there right away, however I don't know how this was handled on the second calling. Ultimately, the system was destroyed when it arrived back with the customer thanks to the courier so we opted to replace the entire system, without cost. The new system, passed all synthetics as usual, however I gotten my hands on it before shipping and was sadly able to replicate the issues thanks to access to new resources.

Issue: (Main part)
The system passed all synthetics with flying colours. CPU was surpassing the average 14900k on Timespy for example. However, the system was spitting out WHEA correctable errors, specifically on PCIe Root Port 2 (PCI bus 0, Device 28, Function 1). Disabling this port stopped throwing out the errors and didn't seem to impact the system. I actually don't know what specific device this is besides it's CPU related but i suspected it was something in regards to the top m.2 slot, I could be wrong, however the drive tested in another system fine without issue. I have this nuke style test which forces errors if there is an instability which is running Real Bench 8gb stress test, Karhu RAM Test, Cinebench R23, Blender GPU compute. This exact test, whilst heavily impeding the system, can still run, however in this example, Real Bench was throwing out Hash Errors and Karhu's error count was flying up dramatically. On the occasion the R23 can catch up, that would also crash as if it was unstable. However, that test was rather pointless as I loaded Helldivers 2 and the game was crashing within the first few minutes.

I tested on multiple BIOS versions, different firmwares and just about everything I could think of. I had determined in the end that the board was the culprit all along and tested all of the same hardware, besides the board. Had access to a Z790 Dark hero, installed everything and then successfully loaded into Helldivers and managed to stay stable, long enough to finish 2 missions. Additionally, I tried my nuke test and it survived. No errors in Karhu, no crashing or anything, just survived 2 hours of it.

Something to note, the Z790 Dark Hero, when in Windows, didn't have PCI Express Root Port 2 (It swapped it out for 3) however, since the issues went away, I was keen on getting the PC shipped in good working order to an extremely patient chap.

During testing, I also had the ability to test this on a refreshed Z790 TUF board, however I don't recall which one, again, issues weren't replicated there, the common denominator for the issue remained the standard Hero. There are plenty of gaps in my investigation, however this has me scratching my head as to whether this issue is by any means related to the 7zip issues we've been seeing on some setups or even for the game companies which have been advising to downclock their speeds? I intend to investigate this further, once time allows.

None the less, issue was resolved with the board change but would love to hear if there are any other insights that could contribute to this?

Professional PC Builder, Terrible PC Gamer.