Thanks to Asus and the wizard Shamino for allowing me to test drive their Maximus 13 Extreme and RKL 11900k sample for this user guide.
Welcome to a strange time in PC land. Where scalpers, miners, scammers and the pandemic have changed a landscape that was once standard fare and far too familiar. Intel throwing on Moar Corez while AMD's Lisa Su just laughs and goes "IPC, baby", while giving us even moar corez! But Intel knew things needed a change and with excellent cooperation with Asus, have finally moved on past an aging architecture that was already past its breaking point, something gamers are now beginning to see with gems like Minecraft! And Asus, with its Z590 Maximus lineup, has once again set the tone for everyone else to follow, with a massively improved VRM (International Rectifier has finally been switched out for Intersil, improving vdroop and transient response), more phases, world record memory overclocking with DJR and a new emphasis on something called stability.
But first let's discuss what was happening with Skylake.
While original Skylake had a rough start (microcode and firmware bugs galore and rather poor overclocking), Intel improved things with Kaby Lake, finally bringing 5 ghz overclocks back to realism, similar to what we got during the Sandy Bridge era. And soon after, making more than 4 cores mainstream, as before the only way to get more than 4 cores were in their HEDT lineup, and the "Enthusiast" platform previously, the X58, with the 6 core Nehalem processors. The 8700k was well welcomed, but there was a dark horse looming that no one seemed to notice or care about at the time--the ringbus/interconnect problem.
Not long after the release of the moar corez 9900k, the first 8 core consumer chip which led to an arms race of cores, came the release of a game called Apex Legends, just a few months later. And unknown to almost everyone at the time, this game showed that the ringbus extension to more and more threads was completely broken. The result: An error called the "Internal Parity Error", which related to the cache/core subsystem and its internal buffers. This problem was massive, generating a gigantic megathread "crash thread" over on the EA forums with people's prime / ycruncher stable systems just randomly going back to the desktop, and a few people who noticed that WHEA Errors were being generated. The WHEA was the crash actually being prevented (corrected hardware error).
To get more of a clue about this error, we have to dig deeper. While many people were just pumping more vcore into their systems to try to bypass the Parity Error and the random crashes, I actually tried *lowering* the voltage even more, just to see what would happen in Apex. Sure enough, this was enough to occasionally BSOD the game, but more importantly, I was able to generate a new error alongside the parity errors:
"Translation Lookaside Buffer Error" (TLB).
"What is a TLB? In a very basic definition, a Translation lookaside-buffer (TLB) is a cache that memory management hardware uses to improve virtual address translation speed. All current desktop, laptop, and server processors include one or more TLBs in the memory management hardware, and it is nearly always present in any hardware that utilizes paged or segmented virtual memory.
By default, a TLB miss whether caused by hardware and/or software complications is not fatal (if the virtual address is not stored in the TLB, it's simply computed and found manually from other source data), but we're crashing on a TLB failure, this implies that the CPU determined there was corruption or a hardware error in date, therefore notified Windows that an unrecoverable hardware error has occurred."
What this shows is that Apex Legends is causing the CPU's to function in a very unstable way, and since some users with completely stock systems were getting Parity Errors and crashing in Apex, this was a MAJOR problem.
Respawn tried to look into this, and thanks to the programmer Oriostorm's efforts, he determined that these errors were a *flaw* in the Skylake processors. Specifically, he noticed that, due to some combination of code on the King's Canyon map, under certain rare (considering just how many instructions are processed every second!) conditions, the processor will attempt to read or write to memory it has "no permission to access". This causes an exception error (read, write or execute) and the game to naturally crash. The parity error was simply this error being caught and corrected in time. The fact that reducing vcore even more would lead to a TLB error showed that there was a very low level issue with multithreading and the ringbus happening here.
Oriostorm decided to change the code path to attempt to prevent the circumstances which would cause this error (I believe the old system was "old_gather_props") or something. The new code path massively improved stability and bought Apex in line with other normal games of the time, and no longer ruining people's daily overclocks.
Notes in the May 2019 patch gave me and some other users credit for helping Respawn improve stability of their game.
https://www.ea.com/games/apex-legends/news/performance-update-may-2019"[PC ONLY] CRASHES SPECIFIC TO INTEL CPUs
We investigated the crash reports from many people who were crashing frequently and found that Intel CPUs sometimes were not executing the instructions properly in one particular function. A common example was an instruction that only reads a register crashed on writing to invalid memory. With the help of many forum users, we found that lowering the clock speed always fixed the crashes, even if the CPU wasn't overclocked or overheating. Thanks everyone, with a big shout out to Falkentyne, TEZZ0FIN0, JorPorCorTTV, and MrDakk!
This has been by far the most commonly reported PC crash over the last month or so and we’ve notified Intel about the issue. In the meantime, we’ve put a workaround in this patch to avoid the crashing at your original clock speeds just by changing the instructions used by that one function."
But this was only the beginning. But until Apex Legends, no game had brought out the multi-score "Skylake threading" problem like this, and even though no one knew anything about this at the time, this "Internal parity error" crash and Apex needing a new code path to bypass the threading conditions was the result of the Skylake interconnect system being stretched to more cores, and helps explain the even bigger latency penalty on CML cores (relative to core position on die).
Minecraft was a java game that had been out for many years now, and at the time, Minecraft crashing was usually just people's windows being corrupted or drivers out of date. Minecraft generating "Internal" errors was completely unheard of at the time. Parity Errors simply didn't occur unnaturally on 4C/8T processors since the architecture hadn't been extended to its breaking point. If you actually got a parity error on 4C/8T, you were truly unstable and were probably waiting for a L0 or a BSOD. Even the 6C/12T gen was mostly overlooked since it was so brief. When the 9900k hit, however, this is when more users started noticing Parity Errors being generated by MC, although no one had a clue what was going on.
What broke the camel's back was the 10C/20T CML.
First, OC'ers noticed that cache/RAM latency went up (as mentioned above). This was obviously forced to happen, but users were rewarded with yeet RAM overclocks from a stronger IMC, and 5.3-5.4 all core overclocks on good chips, which kept CML competitive with AMD's offerings, as AMD simply couldn't touch Intel on memory overclocking. But as thread count went up, the problems causing Parity Errors became more obvious, as players started encountering Minecraft errors in droves, some people even on stock clocks, and some AAA Games (like RDR2) were also generating Internal WHEA errors. The L0 error was already well known; errors on virtualized instruction registers in the L0 register store, which only happened on hyperthreading enabled processors, which was already the major issue with skylake stability. You could push high overclocks and get random L0's which were very difficult to stabilize, depending on the instruction set used, but enough vcore would fix it. But the parity error showing up on systems that passed stress tests was the sign that Skylake, never meant to go up to so many cores, needed to die. And with newer RTX/DLSS games now starting to generate Parity Errors on daily stable systems, something needed to be done.
Enter Rocket Lake.
While Rocket Lake is prep for Intel's true next gen platform, Alder Lake and DDR5, Intel needed to prepare this platform for maturity, while moving on from Skylake and all its bugs. While ADL is rumored to have two IMC's, the backport of Cypress Cove to 14nm, with only one IMC and the Gear changes, hurt RKL considerably. But this is a necessary evil because Skylake HAD to die. And the IPC increases (~19%) are real and will only keep getting better on future gens. But with people breaking NDA, and releasing benchmarks with pre-beta Bioses and broken memory overclocking, showing off terrible bandwidth results (NDA's exist for a REASON, people!), every single person overlooked something.
Stability.
The Death of Skylake also meant the death of Skylake bugs.
1) CPU Cache L0 errors are now a thing of the past. No more random L0's thinking you're stable and only partially stable with BSOD's that look like RAM errors (System Service Exception, IRQL_NOT_LESS OR_EQUAL, etc). You just BSOD now, with the well known "Clock Watchdog Timeout", or in other words "I'm not stable, chump, try again". There's no more "middle road". You're either stable or you BSOD. (I'm referring to the CPU core itself, NOT to the IMC or RAM errors--those still will happily make your life interesting).
2) Parity Errors are byebye. No longer will Minecraft generate parity errors due to garbage collection in the caches. Now it just runs. Or you BSOD.
3) The rules have changed for stress testing.
Prime95 "AVX stable" is no longer a valid test for gamers. You can push your 5.2 ghz 11900k overclocks, run small FFT AVX DISABLED Prime95 and get a quick ClocK Watchdog Timeout, because your CPU hates you for you letting it get too hot. You run Cinebench R20 and get a clock watchdog timeout.
Boot up Battlefield 5 (One of the gold standards for stability testing) and it just runs like nothing ever happened. No BSOD, no nothing.
If you want to test if you're stable in games, now you just game. If you're unstable you'll know very fast. The skylake rules are deader than a dead horse.
4) Average overclocks will be a bit lower than what most people expect. 5 ghz all cores will be about the average. Expect about 200 mhz lower than CML chips. However due to stability changes, you may be able to play video games significantly higher than you can run any stress test without random L0 errors. Most 11900k's should be able to game at 5.1-5.2 ghz with HT enabled and 5.3 ghz HT disabled.
I think most gamers will find this a very welcome change to CML "almost stable, but I crashed in COD: Cold War / Minecraft / RDR2" thus not stable overclocks.
Some changes to this new platform:
1) New VRM controller and VRM's. Allows lower v-latch deltas (transients) and lower set vcore when a CML processor is installed vs the previous Z490 motherboards of the same tier. Because Intersil.
Some of the LLC levels are slightly different. But LLC6 is 0.495 mOhm/0.49 mOhm on Maximus 12 Extreme and Maximus 13 Extreme, but M13E needed 20 less mv on my 10900k @ 5 ghz for same stable load voltage (1.146v), and amps draw was also lower!
2) Each core now has its own PLL. This allows independent core clocks to be set on each core. This also greatly improves TVB and Adaptive Clocking, as now all cores can clock up to the TVB frequency on light loads. The favored cores still exist and TVB still exists, with the all core turbo dropping down to 4.8 ghz under heavy loads.
There are now configurable limits for max auto voltages, due to users being afraid of too high voltages. You can configure them in "Auto Voltage Caps."
3) AVX 512 support.
4) AVX guardbands can now be configured independenty in the BIOS, and AVX512, AVX2 and AVX can also be manually disabled. Not that I would recommend disabling it but the option is there if you so choose. AVX guardbands allow voltage scaling during AVX loads (basically how much voltage is increased. 1.00=no increase, 1.25x=1.25 scale factor). Most users won't need to adjust this, but if you're experiencing stability issues and your cooling permits, you may want to look at this.
AVX offsets are different due to each core having its own PLL (see #2). The AVX offset references each core's ratio rather than the all core ratio, depending on how many active cores are referenced. So if Core #0 has a multiplier of x52 and encounters an AVX workload with a negative AVX offset of -2, that core will drop to x50, while the other cores will not be affected! However if an all core AVX load happens, the all core ratio will kick in and limit all the cores to the "All core" ratio limit, with the negative per core offset only referencing the original value. If you want the ratio to go below the all core limit, then you need to lower the offset even more. So in other words, AVX offset is referenced against each individual core's ratio limit, unless the all core offset is greater (lower) than the all core ratio limit value.
5) BCLK overclocking is changed slightly due to the PLL. People pushing very high BCLK will be able to adjust a PVD threshold (a divider) when pushing extremely high BCLK. I don't do BCLK overclocking, but LN2 and world record seekers, especially some of the guys who love RAM bandwidth, will be happy this function exists. This divider is /15 by default, based on 100 mhz reference clock, and the threshold to switch to a x2 post divider is going to be 15. The x4 and x8 dividers come from the PLL, from the x2 threshold, and this can cause problems when pushing the BCLK extremely high. I'm talking about 200 mhz BCLK and higher here. If you encounter failures, please reduce the PVD thresheold. People wanting to push their RAM past normal limits or bypass normal multiplier limitations will enjoy this.
I expect some interesting results with DJR memory and this feature as more users start looking into BCLK + DJR Magic with Gear 2.
6) New Gearing mode (Prep for Alder Lake).
Memory can run 1:1 (synch) mode up to 3733 mhz. This will usually require a hefty increase in VCCIO and VCCSA. 3866 mhz requires work and silicon lottery and sometimes a large increase in IO/SA. 4000 1:1 is possible on SOME chips with maybe 1.65v IO/SA but do not bother. Listen to cstkl1. Just use 1:2. There is no bandwidth penalty from using 1:2 versus 1:1. Ignore the old pre-beta bios leaks. They mean nothing. NDA exists for a reason and people breaking NDA to sell chips early with beta Bioses doesn't help anyone. Shamino did his magic for us and fixed the memory bandwidth issues quickly. If people are still having issues e.g. with single rank, certain RAM kits, etc, please post your specs, RAM kits etc in the Maximus 13 Megathread and details of the issues.
At Gear 1 (1:1) you can use 100:133 only. At gear 2, you can use both 100:100 and 100:133, but 100:100 is significantly worse than 100:133 and is not recommended.
There are now two VCCIO rails on this platform. CPU VCCIO and MemOC VCCIO. MemOC VCCIO is the one that will be most familar to those from Z490.
There is also System Agent voltage, as before.
RTL/IOL tweaking does not seem to work, or no one has figured it out yet.
You can set "Round trip latency" tuning in presets to enabled for now.
Skews are different. Do not expect your old WR, Park, Nom values to work.
Most normal RAM timings and rules will function similar to before.
Unstable memory can cause "00" after failed training sometimes.
There are three memory presets for speed features. Known as "SAGV", you can think of them as "Memory power states, or P-states" Low, Medium and High. They are NOT the same as the "Gear" settings for the controller. It does not make sense to use "Low" frequency if not at XMP (e.g. Jedec) since switching from Low to "Low" 'doesn't make much sense.
Overclocked memory can try the "High" setting (this is full speed gear), but certain memory configurations may fail to boot. This is because certain variables like the Controller/Ram Ratio modes will affect this setting. This setting is disabled by default. It's suggested to simply "enable" this setting and leave the "Gear" configurations at auto and test it there. Overall, there wasn't much benefit to enabling this setting. Try it if you are trying to get the most out of an overclock.
Prime95 30.5 build 2 all AVX disabled, Large FFT is a valid test for memory/IMC stability. If you can run 30 minutes minimum without errors, you should be stable enough for other use.
Setting Command rate 1T may train two additional settings: CMD SLew Rate and CMD Drive Strength. You may see POST CODES 6C and 6A.
There is an integrated Memtest86 stress test in the BIOS. You can test your memory overclock to be able to load windows without a BSOD, without having corrupt memory cells touching a system file.
Important: High cache ratios require a much higher Vcore to switch to that ratio while in windows to avoid crashing, than you need to run stress tests. This is different than CML behavior.
High cache ratios now require MUCH more vcore to run closer to core than CML. VID for cache is higher to much higher than its Core counterpart link. On CML, you could run cache up to its max (fused) ratio. On RKL, you will often need to stay with more down binning because cache VID cannot exceed core VID. If you try to run x51 all core OC and x48 cache and get "Check CPU", well, that's why.
7) V-LATCH!!!!
Not RKL specific, but Asus and Z590 unique. New to Z590 and only Asus has this as far as I know.
Asus' new proprietary hardware circuit is able to record transient min/max values below what the regular controller can read as Vcore, and allows reporting transient voltage true min and max spikes and dips to OS, and the OLED display, just like an oscilloscope. No longer do you need to buy a $300+ 2-4 channel scope to get true vmins! Board can log V-latch max (max spike), V-min (lowest dip) and V-delta (true min to max delta) with a dip switch, enabled for logging, disabled for logging cancel, and can be activated just by running Hwinfo64. Hwinfo EC reporting also has v-latch measurements. Now you can determine your ideal LLC for your workload and save it to a profile.
This is something that users wanted to know for a very long time, just how much the voltage spikes and dips were when they were running different LLC's and vcore levels.
V-latch monitoring is available on the OLED display (no performance penalty) and in HWinfo EC monitoring (some polling penalty as on previous gens, as already known). Requires newest HWinfo version (7.00+).
On my 11900k, I determined LLC5 to have the best overall V-latch delta for min-max with full core non AVX Testing (thanks to Shamino).
LLC5 is 45% reduced vdroop (approx 0.73 mOhm). LLC3 is still the baseline of 1.1 mOhm (Intel spec).
A lot of people will like this and you must choose Asus to use this type of feature. I'm excited about this and it saved me a lot of money over trying to buy a Siglent scope I might only use once, and I put that cost into a nice Fluke DMM instead. Thank you Shamino! (This feature was purely from Shamino's efforts and is a first among any motherboard design).