cancel
Showing results for 
Search instead for 
Did you mean: 

C8F / BIOS 3204 - IF Stability Issues

chrismog2_2
Level 7
I'm unfortunately still getting some WHEA errors and reboots on the latest BIOS, but only while gaming (specifically FF14) and it takes anywhere between 30-90 minutes. Every other benchmark, stress test, RAM test, etc. in and out of Windows completes fine. Running a dungeon is a good way to trigger it quickly. I tried moving the RAM down to 3200 and IF down to 1600, since those are "supported" per AMD, but that didn't help. Cleared CMOS several times and entered in settings as follows...

- DOCP Standard (to set all the basic timings and voltage for RAM)
- Set RAM and IF to desired frequencies
- Disabled PBO / Fmax
- Manually set tRC (for some reason auto sets it way off)
- Enable Resizable BAR (not sure this has any impact, it was BSODing w/o this setting too)

If I move everything to default and disable both core performance boost and PBO, no BSODs. I've tried some manual VSOC values with varying success, but almost always end up with a BSOD, it just slightly extends or shortens the time... the one time I didn't, VSOC was close to 1.2, and I got a corrected WHEA error per the Windows logs. (Is 1.2+ safe, if I wanted to explore this further?)

CPU: 5950X
Mobo: C8F
RAM: G.Skill F4-3600C16-8GTRS (8GB x4)
GPU: RTX 3090 FE
PSU: Seasonic PRIME TX-1000 (Actually bumped this up from a PRIME Ultra 850W because I thought I had power issues w/ the 3090 drawing too much... guess not)
6,327 Views
24 REPLIES 24

Kelutrel
Level 11
PSS support is the new name for the Cool'n'Quiet technology on AMD cpus.

This technology makes the CPU cores go into the C6 sleep state when idle, and the SOC and CORE electronics are then stressed by the continuous fluctuations in current and clock frequencies due to the voltage changes when in this deep sleep state.

More than once disabling it promptly solved random reboots issues or stabilized an overclock. The drawback is that the CPU cores will never reach the C6 sleep state and only use the C1 sleep state, so they would use a bit more watts when idle, also it is suggested to keep the PSS support disabled when overclocking.

The C6 sleep state on the AMD Zen3 is quite aggressive, you can try to re-enable PSS support and you may observe in HWInfo64 that some cores will lower their voltage to nearly 0.2v when entering the C6 sleep state, the SOC interface for that core will also be put to sleep during the C6 sleep state period. It is this constant fluctuation between 0.2v and 1.4v that sometimes may cause a core to fail when exiting the C6 state.

If you can confirm that that setting solves your issue, then I would say that the issue is caused by some VRM/Voltages fluctuations. You may want to try to increase your LLC, switching frequencies and power phase controls in the BIOS to see if that compensates (you can get the stable settings using the Ryzen DRAM Calculator) and allows you to re-enable the PSS support. You may also want to set the BIOS AI Overclock Tuner to Manual instead of DOCP so to avoid some weird interference with your voltages.

Your issue may also be solved by a future BIOS update as the VRM calibration logic is in the BIOS.

Also, you can put your SOC voltage back to Auto. If it is the PSS support that is causing the issue, the increase in SOC voltage is unneeded.

Interesting, thanks for that. But I'm still seeing some fluctuating values for C6 Residency for every core except cores 1 and 2 in HWiNFO. Does this setting also correlate with the Global C-States setting, or do they both need to be disabled for this to be fully effective?

Running out of time to test further today, but I'll dig back into it tomorrow.

edit: after a reboot just to make sure I still had PSS support disabled (I did), now cores 2 and 14 are at 0% C6 residency, the others are bouncing up and down as needed.

edit #2: secondary effect, all of my CPPC preferred core numbers are now set to 1, instead of ordering 1-16 as they used to. So each core in the list says (perf #1/...)

chrismog2_2 wrote:
Interesting, thanks for that. But I'm still seeing some fluctuating values for C6 Residency for every core except cores 1 and 2 in HWiNFO. Does this setting also correlate with the Global C-States setting, or do they both need to be disabled for this to be fully effective?

Running out of time to test further today, but I'll dig back into it tomorrow.


Uhmmmm... you are right, I am pretty sure that with previous BIOS versions disabling the PSS support was also disabling the C6 state but I verified now that with BIOS 3204 it looks like there is still C6 residency when the PSS support is disabled.

If you can reliably confirm that your issue is solved by disabling the PSS support anyway then there is no need to do anything else. Global C-State or VSOC would make no difference. But you should be reliably sure that that setting solves the issue, so maybe put everything on Auto or as you prefer, and just keep PSS support disabled and use your PC normally as you would do for a couple days to confirm it is now fully stable. If not, try to disable Global C-States too.

I wonder what changes with BIOS 3204 when disabling PSS support then. If you ever understand what it does now please let me know (it used to be described as "PSS Support = AMD Cool & Quiet C6 Mode").

I found this image on an old post explaining the PSS Support items:
https://imgur.com/a/nti5lLu

And now, strangely, after a cold boot this morning, none of my cores are hitting C6 at all. Only change I made was to set SoC voltage back to auto. May or may not have time to run more gaming tests today, to ensure the problem is resolved.

It's been a few days... I returned my VSOC to auto on Monday, but couldn't do any gaming testing until today. System's been stable otherwise, doing basic desktop work and all that... but that has always been the case, so it's not a big indicator of anything. Gaming tests tonight went fine though, roughly 90m of a grab-bag of tasks within FFXIV, including some tabbing out to open other apps + letting the game idle for a bit. I'd hesitate to call it "fixed" after just two data points but it is a big step in the "more stable" direction at least. I also installed the newest chipset drivers off the support page for the C8F today, so there's that too.

Assuming all goes well, I'll probably enjoy the fact it's working for a few days, and then subject myself to more frustrations "for science" by re-enabling PSS and/or rolling back the chipset drivers one by one to see if anything causes BSOD's again. Appreciate the help.

Silent_Scone
Super Moderator
Hello,

What are you using to test memory stability?
13900KS / 8000 CAS36 / ROG APEX Z790 / ROG TUF RTX 4090

Silent Scone@ROG wrote:
Hello,

What are you using to test memory stability?


This may sound silly, but just a game (Final Fantasy XIV) and a specific dungeon (one of the newest, Matoya's Relict). Because nothing else was replicating the BSOD behavior, and I could get it to reliably BSOD within roughly half an hour of play otherwise. Normal desktop behavior (browsing/installing/uninstalling, zipping/unzipping large files, etc) never had issues. OCCT, Prime95, Memtest86, Karhu RAM Test, Cinebench R23, 3DMark, etc. all passed with flying colors, for hours on end. On applicable tests I also tried lowering the thread count to 6 or 8 to simulate the max threads a game would be able to handle, and still got nothing out of it.

Well, after a few days of not having any issues, got another BSOD while playing the same game, roughly 30 minutes in. No dump file (main screen froze at 0% on the BSOD dump creation; sound locked in a stutter of whatever was currently playing; second monitor mostly went black except for a "negative" of some text). Completely stable all day otherwise, though it was mostly light load w/ web browsing and an RDP session.

chrismog2_2 wrote:
Well, after a few days of not having any issues, got another BSOD while playing the same game, roughly 30 minutes in. No dump file (main screen froze at 0% on the BSOD dump creation; sound locked in a stutter of whatever was currently playing; second monitor mostly went black except for a "negative" of some text). Completely stable all day otherwise, though it was mostly light load w/ web browsing and an RDP session.


But what is the stop code?
13900KS / 8000 CAS36 / ROG APEX Z790 / ROG TUF RTX 4090

Silent Scone@ROG wrote:
But what is the stop code?


Same as always, WHEA_UNCORRECTABLE_ERROR on the blue screen, but the Windows system log doesn't have a chance to record it. So it goes in as an unexpected shutdown (event id 6008), dump file creation failed (event id 161), and a crash/reboot (event id 41).

On the rare occasions where I actually get a memory dump file, the debugger shows me it's an L1 cache read error. I've seen it on cores 12, 13, and 14.