I'm unfortunately still getting some WHEA errors and reboots on the latest BIOS, but only while gaming (specifically FF14) and it takes anywhere between 30-90 minutes. Every other benchmark, stress test, RAM test, etc. in and out of Windows completes fine. Running a dungeon is a good way to trigger it quickly. I tried moving the RAM down to 3200 and IF down to 1600, since those are "supported" per AMD, but that didn't help. Cleared CMOS several times and entered in settings as follows...
- DOCP Standard (to set all the basic timings and voltage for RAM) - Set RAM and IF to desired frequencies - Disabled PBO / Fmax - Manually set tRC (for some reason auto sets it way off) - Enable Resizable BAR (not sure this has any impact, it was BSODing w/o this setting too)
If I move everything to default and disable both core performance boost and PBO, no BSODs. I've tried some manual VSOC values with varying success, but almost always end up with a BSOD, it just slightly extends or shortens the time... the one time I didn't, VSOC was close to 1.2, and I got a corrected WHEA error per the Windows logs. (Is 1.2+ safe, if I wanted to explore this further?)
CPU: 5950X Mobo: C8F RAM: G.Skill F4-3600C16-8GTRS (8GB x4) GPU: RTX 3090 FE PSU: Seasonic PRIME TX-1000 (Actually bumped this up from a PRIME Ultra 850W because I thought I had power issues w/ the 3090 drawing too much... guess not)
First, congrats for getting a 3090. Second, from the symptoms I'd say it looks like a PCIe issue. I would first verify that you used two different PCIe 8pins rails/cables to provide power to the 3090, and not a single split cable, as a single split cable is known to cause exactly this kind of instabilities due to power spikes. I would also try to set all PCIe to GEN3 in Advanced>Onboard Devices Configuration, on the bottom of the configurations list there are 4-5 settings that you can change to GEN3, change all of them and try if it fixes your issue.
Thanks. I'm not sure power spikes are causing the issue, but as I don't have a meter to test directly and am relying on HWiNFO to tell me (which would obviously stop reporting at the time of the BSOD) I can't rule it out. So far I haven't seen it creep any higher than 350W draw from the GPU. The 3090 FE uses the new 12-pin connector, and Seasonic provided their own cable (working around the "cheap" nVidia provided adapter) which uses 2 PCIe connectors on the PSU. I've tried running OCCT's "Power" test to max power draw from both the CPU and GPU, and will never see a BSOD doing this, only while doing real-world gaming.
It might help if I denoted that all of these WHEA errors, when I can actually get a dump file or a Windows log out of them, are of the "Cache Hierarchy Error" and/or "Bus Interconnect Error" variety, and the memory dumps indicate an issue reading from L1 cache. The issue does not surface if I run the RAM and IF at "stock" i.e. JEDEC timings/speed on the RAM, and the corresponding half speed on the IF, even if core performance boost is on - so that should cause roughly the same power draw (SoC aside, which is marginally lower because of the reduced stress).
I'll try the PCIe 3.0 stuff and respond later as to how that goes.
Uhmmm... as long as you are using 2 different PCIe power cables, and not a single split cable, to bring the power to the 3090FE then the power spikes should be out of the question.
If it is something related to the SOC cache, you may want to try to configure your VSOC to Offset Mode and set the offset to negative 0.05v (when in offset mode the provided SOC voltage starts from 1.2v, so adding a negative 0.05v offset you would bring it to 1.15 VSOC). Or you may want to try to disable PSS Support in Advanced > CPU configuration.
These are just blind attempts at finding which subsystem is causing the reboots and WHEA errors, assuming you can replicate the issue somehow reliably, and once you pinpoint which one it is we can probably find a more stable solution.
I have a 5900X+C8F on BIOS3204 with RAM@3800/1900 and a 2080, no problems here.
Well, I just repeated the core performance boost + stock RAM speeds, and actually did get a BSOD while gaming, so that negates my previous statement. That shouldn't tax the SoC at all. I'm going to try the PCIe adjustments... if that still results in a BSOD, I might have a dud CPU (AGAIN!... already RMA'd one before this, which would BSOD when coming down from any high load).
As an aside, are you also running 4 sticks of RAM, or just 2?
4 sticks of 8GB 4000MHz B-Die. Please try also the other two suggestions I gave, the VSOC negative offset to 0.05v and disabling the PSS Support. Also, be sure to flash BIOS 3204 using the BIOS Flashback tool, and re-enter your settings manually and not by loading a previous BIOS profile. I understand that a normal user should not expect to go through this kind of configuration process to get a stable system, but forums are full of people that thought they had bad CPUs and in the end solved their issues with a bit of BIOS tinkering.
PCIe Gen3 didn't help. Just tried a -0.05V SoC offset, and it doesn't start at 1.2V, it starts at 0.975V -- so I'm sitting at 0.925 currently. That seems a bit on the low side.
I'm going to re-flash BIOS from the flashback just for the heck of it, and we'll start from there again, then run through all the items one by one. I updated it using the EZ Flash from the BIOS menu previously, but I did also update from 3202 which was a beta.
I verified and I can confirm that on my C8F by setting a negative 0.05v offset on VSOC I get 1.15v displayed in HWInfo64, ymmv but the point is to get 1.15v on VSOC so to rule out an insufficient voltage to the SOC cache. You may also want to try to disable PSS support just as a further verification. In the changelog of beta BIOS 3202 it is specifically mentioned "This Beta BIOS can only be reversed by BIOS Flashback." so that is why I suggested that.
Now that I managed to track down a USB stick that actually worked for flashback... all that is done, but no change to the auto values. I set VSOC to 1.16875 in BIOS which gives me a flat 1.15000 reading in HWiNFO and ZenTimings. VDDP 0.9002 - CCD 0.9474 - IOD 1.0477 (all values from leaving them on auto)
Time to test and see what happens. One variable at a time.
Okay. So just setting the VSOC to 1.16875 didn't help - still had a BSOD w/ no memory dump. Without changing it from that value, I disabled PSS Support, and made it through a dungeon and some misc. running around in the game.
SoC power draw at these settings is bouncing between 18.2-24.3W. Core+SoC topped out at 190.511W. Temps didn't exceed 70C for the CPU package. Only thing I'm slightly concerned about is the DIMMs approaching 48C, but I haven't seen any RAM-related errors at any level, just CPU L1 cache.
I'm going to continue testing along with some reboots in-between to see if it was a fluke, of course, but... if you don't mind me asking... why might disabling PSS support help in a situation like this?