cancel
Showing results for 
Search instead for 
Did you mean: 

MCE explanations and others

Shamino
Moderator
MCE

I’m seeing quite a lot of misunderstanding the workings of MCE so I’m partly writing this to address it.
It is not the taboo that it has been made up to become. There are 3 options for it, namely Auto, enabled and disabled.

Enabled merely maxes out Power and current limits so that users don’t have to manually do these themselves.

Disabled sets these limits to intel’s defaults. Even when you customize ratios, these limits are still in place unless manually adjusted.

Auto means that the board has liberty to determine what limits are reasonable, competitive, reliable and logical. Factors such as thermal, performance, Segment, competitor’s out of box perf, stability are taken into account. Logical meaning that when you customize a ratio, all limits are raised to the max with the logical assumption that you want to run that frequency and not clip from power.
Therefore it is totally redundant to disable MCE and then max out power and current limits, since enabling MCE does the exact same thing and no more. Really, just leave MCE at auto if you plan on overclocking, it does you no harm.

TVB

Now for the current emphasis on totally stock perf of the i9’s by the review sites, all the attention is on TDP but that’s just a gnat compared to the camel swallowed. NO site actually talked about and examined the latest feature of the i9, Thermal Velocity Boost TVB. By default Intel enables this but I see that only Asus boards enable this at defaults. The other boards I tested have this disabled even at defaults.

What this does is it reduces voltage guardbands depending on core temp. Traditionally, the voltage request by the proc is always based on worst case scenario TJMAX, meaning the voltage the proc thinks it needs for the frequency when temp is 100c. It is well-known that the cooler the chip runs, the lesser the voltage needed. Therefore TVB is opportunistically reducing power and temps. The behavior is quite linear and I observed the following on several samples.

TVB takes effect from 40~50x on 99k and 40 to 49x on 97k and 40 to 47x on 96k, simply 40x to single core boost ratio. The V/temp curve runs from 0c to 100c. For example 150mv delta between 100c and 0c for 50x, meaning every 1C drop from 100c VID requested will reduce by 1.5mv. The reduction is smaller as you go down to 49x, the smaller the ratio the smaller the reduction, and below 40x you get no reduction. This is good for most people running stock. You can try this yourself by noting the VID idle, and then unplug your water pump and let the core temp rise slowly, noting down the correlated temp/VID, and see what i'm talking about.

During OC, when you try to run adaptive mode voltage with this mechanism, you will need to change your perspective in how you set the ‘target adaptive voltage’ since you need to assume that’s the voltage you get when 100c and do the reduction to your lowest (usually ambient) temp and gauge what voltage is needed to be set. So if you set 1.35v for example, when you idle at 30c you will get maybe 1.25v instead. This can be confusing for many people, therefore we disable TVB once you customize a ratio. This is not to say you cannot exploit this mechanism to work for you during OC but you really need to find out your idle Vmin (lowest stable voltage). You can find this option in CPU internal power management in the bios and you can force it to enable during OC.
For those who want to check or try this on other boards, simply download r/w everything http://rweverything.com/ and add CPU MSR 0x150

Access this register and set bit 63 to 1 and [39:32] to 18h:

https://ibb.co/gUyvUf

Bit 3 shows you if TVB is enabled or disabled (0=disabled). If TVB is disabled, simply flip the bit and use command 19 to write.

https://ibb.co/jCEDFL

https://ibb.co/muKDFL


Then you can see what the default stock behavior is really like. This will truly affect temperature, power consumption, boost frequencies when TDP is default, etc so those who want to dig deeper into ‘stock performance’ really needs to get this correct.

The other thing that also affects ‘stock performance’ is the ACDC loadline programmed into the processor. Boards should let CPU know the actual loadline the board is currently set to by writing the correct loadline. This doesn’t mean that the board has to be honest about it, and with the generous guardband Intel are used to providing (not as generous any more perhaps – well you know they need to factor in stability after 10 years of heavy use for example), it is not uncommon for boards to lie to the processor so as to get it to undervolt. You cannot really tell how much the board has lied to the proc but at the same frequency/load, just by probing the inductor on different boards with a multimeter, you can see that at least more than one board is lying to the proc. Obviously TVB setting should be similar during the test or else you get very skewed results as explained above.

Finally, VRM temp should not be the only factor when evaluating a VRM, much less a whole board. For OC, my opinion is that transient response is very important. Contrary to popular belief, you do not need expensive equipment to test transient response. You can use Cache OC or AVX offset to test this.

If you played with Cache OC, you see that it is very intolerant of any undershoots. Straightaway you would hardlock or BSOD. You can even test it at default. Since it shares the same rail as core, set core ratio to something really low like 40x. Set min and max cache ratio to 43x and set a manual voltage like 1.15v. Run a heavy load like prime 95 non AVX. Dynamically slowly reduce the voltage 5mv at a time. You will find the VMIN this way. Once you find the VMIN under continuous load, stop prime95. If it doesn’t hang, run it again, back and forth between running and stopping. Even try booting straight from bios with that VMIN. You will see that this VMIN requires a guardband for transient load changes, meaning you will need 5mv+++ more. You will observe bigger guardbands needed at higher cache. Obviously the better the transient response, the guardband requirement is smaller.

There is also AVX offset, or ratio change mechanism in general that you can observe transient response. First, find the VMIN under continuous heavy load like prime95 non AVX 26.6 on say 47x cpu ratio or something with a manual mode voltage with AVX offset at 0.
Next set AVX offset to any value, such as 1 or 2. Run the same frequency/load at it’s VMIN. It will not last too long.

Avx offset or other ratio change mechanisms has always had this issue whereby voltage guardband needed is bigger
Heres why, the ratio change takes place by getting the core plls to go to sleep and then waking up to new pll frequency.
The transient is very bad and violent when u run high loads cos it will go from really high load to almost no load and back to high load very quickly.

Now you may think you did not even run AVX. For AVX offset, a lot of background stuff may run a few AVX instructions, such as dot net framework.
Sometimes u can see avx offset occur when u don’t deliberately run avx, its usually very fast and you only see the small pockets.
Therefore the ratio change occurs quickly and vmin is raised due to the guardband requirement increasing.

The way to mitigate this is to use a steep LLC and higher vid. The transient will be better.

You can trigger this guardband by doing other stuff that changes ratio, such as when running prime 95, keep setting down short duration power limit and upping it with XTU continuously.
The ratio will keep changing and finally hang when your guardband is just enough.
Or just keep changing ratio up and down.

Therefore use AVX offset bearing the extra guardband in mind. This is totally the behavior of Intel’s proc. Again, obviously you can gauge the ‘responsiveness’ of a board by measuring the GB needed. For example you can logically conclude that a board that requires 150mv GB is less ‘agile’ than a board that requires 80mv guardband.




Adaptive Voltage

Lets start from the basics, how the CPU's Dynamic Freq volt scaling works.
#1 the mobo's bios tells the processor the current loadline characteristics via AC DC loadline values.
#2 the cpu, based on its own native VF curve and the info in #1, requests for a voltage from the controller.
#3 the voltage that eventually reaches the cpu is this voltage minus the droop from loadline

easier to understand from an example:

10900k running at 4.9ghz currently and drawing 150A. bios programmed AC DC LL to 0.50MOhm.
the cpu's native vf point at 4.9ghz is 1.30v.
the cpu anticipates 75mv droop. (V=I*R,,, 150*0.5) the cpu requests for 1.30v + 75mv = 1.375v from controller.
the current VRM loadline the user sets is level 3 which is about 1.1MOhm.
the actual voltage that the cpu eventually gets after the vdroop from the mobo is 1.375v- (150*1.1)mv = 1.21v

##Note: The above is an illustration without TVB voltage optimization enabled. if it is enabled, then it adds another variable into the equation @ #2 (volt requested for -volt optimization from temperature -> we leave this as zero so that it is more understandable in the above example)

After understanding this, we can better explain Adaptive voltage, which is not too complicated but requires you to bear in mind the rules it follows.

1) When cpu frequency is smaller than or equal to the highest default boost freq, for eg 5.3 on 10900k (lets call this p0 freq):
whatever you set as an adaptive voltage is ignored by the cpu since it only references its own native vf curve at freq <=p0freq

2) And even if you are at a freq higher than p0 freq, if you set a value that is smaller than its native p0 freq vid, this gets ignored too.

example:
10900k with a native vid of 1.5v at 53x. you run synch all cores 52x and try to set 1.45v adaptive. this is futile becos cpu ignores it due to 1)
then you go up to 54x and try to set 1.475v, this is futile as well as the cpu again ignores it due to 2).
then you set voltage to 1.52v, then the cpu finally starts honoring this request because 1) and 2) are false.

=> so in short, adaptive voltage ONLY takes effect if freq >p0freq & value > native p0vid. And even so the eventual voltage you get is the result after going thru #1,#2,#3

so what to do for freq <=p0 freq, how to get volt u want in this range? well, short of offsets / vf pt offsets, you can manipulate the variables in #1 and #3 to get the v you want, ie manipulate AC DC LL values and/OR VRM Loadline values. for my preference, i would stick to a good VRM loadline that is good for transient, example Level 4, fix it in this position and trim AC DC LL values.

The svid behavior option just contains static AC DC LL presets, apart from "Trained" which is part of the AI algo, that sets a predicted AC DC LL value taking into account freq, cpu/cooler characteristics/vrm ll value.



S/w VID readings

S/w vid readings may not always reflect the actual vid requested from the controller, in fact, unless DC Loadline is written to 0.01, it wont.
what it reflects is actually the voltage cpu anticipates to get, calculated from DC LL value.
so for example, when you see VID reading of 1.35v and DC Loadline value is 0.5MOhm, what is actually requested from the controller is:
(for simplicity im gonna leave out the fixed 200mv offset requested by cpu for >=8cores)
1.35v+0.5*current at the moment:
for example:
1.35v + (0.5*180A) mv= 1.44v

So why dont we set DC LL to 0.01 and AC LL to whatever we need (since the actual vdroop compensation cpu requests for boils down to AC LL Value)?
Well you can but when AC and DC LL values differ, the current and power calculations done by the cpu gets skewed.



New VF Pt offsets on Z490:


the vf curve refers to the stock vid of the proc at various freq and the vf pt offset allows you to fine tune per each point. All these pertain to adaptive voltage mode instead of manual mode, since manual mode uses a fixed voltage setting across all freq. Bear in mind the nuance that it has to be monotonic and setting a higher freq with a lower resultant volt will only get volt as low as the point before it.
As an example, you see from bios menu or s/w that vf pt 53x is 1.334v vid
vf point 7 , the pt before that is 1.314v
say you target 1.25v VID for 53x, setting negative offset of -0.084 for vf point 8(53x) will only result in at actual 1.314v since vf point 7 is 1.314v and u cannot set pt 8 lower than pt 7. at this time, you then decide to set vf pt 7 to negative -0.069‬, and this sets vf pt 7 down and also allows vf pt 8 to come down to 1.25v.
the software tool i posted forces you to adhere to this rule so its useful for runtime testing in os and allows you to free yourself from doing the math.

this is to illustrate the rule it adheres to, but an illogical approach because i dont think one should target a voltage for a freq but target a freq and get the necessary volt for it.
so in actual use case, you would just be trimming and trimming each point, double checking stability throughout the trimming process.


Edit: 11/19

Update of new Feature: Overclocking TVB:

OverClocking TVB is an extension of the TVB feature allowing you to customize frequencies according to temperature.
This, in my opinion, is a useful feature that milks the last bit you have got at light loads without requiring additional voltage. In a nutshell, it takes that 5~8C extra margin you’ve got, and converts it into additional frequency.
It is only supported on 10900K/non K variants atm, and maybe 10850K. IF unsupported, the information will display N/A
Everything TVB related is now grouped into the Thermal Velocity Boost menu:

86774

86775

At the top, it reads back the current configuration of the OCTVB.
For this to work properly, CStates must be enabled for proc to be active core aware! If you synch all cores, make sure you manually enable Cstates.
Active Cores refer to the row of settings applicable when that number of cores are active. Ratio Setting refers to the associated core ratio for that active core count. Temp A refers to Temperature A for that active core count above which the ratio would drop by it’s associated Ratio offset. This offset is the Negative Ratio Offset A pictured above. Temp B refers to Temperature B for that active core count above which the ratio would drop by a further 1x.
Let’s take a simple example below:

86777

Right now Cpu runs at 50C and only core is currently active. Ratio is therefore 55x.
User does something, the active core gets hotter and becomes 72C, and still only 1 core active. Ratio now becomes 55-1=54x because 72c is > Temp A 68C and the negative offet is 1. If negative offset is 2 for eg, then it will become 55-2=53x.
And then, the user loads it further and now temperature is 82C. Ratio now is 55-1(from TempA) -1(from TempB) =53x because temp is > tempB of 78C and a further 1x is deducted. Temp B negative offset cannot be configured and is a fixed 1x.
Then the user does something different and now 3 cores are active. The applicable row becomes the third row in the picture above. CPU runs at 60C right now and so none of TempA/B has been exceeded, therefore ratio is the original 53x. then proc gets to 77C, TempA is breached, it’s associated offset is 3x so proc drops to 50X. Again it runs hotter still, gets to 87C. TempB is breached, proc drops a further 1x and ratio is now 49x. And the story continues…

Hopefully, this example is enough to explain.

86776

The control is under Overclocking TVB, customize it using “Enabled”
When enabled, you get to customize the params for each row (each active core count) The Ratio, you configure it the main menu like you always do, whether you go with synch all cores (if you go with synch all cores pls manually enable cstates so that the proc can tell number of active cores) or by core usage it doesn’t affect this.

86778

It can be very time consuming to customize it yourself, so we have made 2 predicted presets for you, the +1boost profile and +2boost profile

86779

Just use it ON TOP of your current/maximized oc setting.
It will do an additional 1x/2x on top of your current setting and set auto-calculated temperature boundaries based on the associated frequency. This does not add voltage because it still uses the voltage before adding the boost and merely tries to scrap some performance from moments when there is thermal headroom.
So for example, I would load Ai optimized, then enable to +1boost. I find it stable, feel a bit adventurous, then I change it to +2boost and try.
Or my current OC is 54X @ 1.4v, I keep this and I just go into OCTVB to enable +1 boost. (if you go with synch all cores pls manually enable cstates so that the proc can tell number of active cores, or just use by core usage and set every core count to same value)

OCTVB for RKL is slightly different:
Guide:
https://www.dropbox.com/scl/fi/hz7laveryk4bbwo645a54/rkl_octvb.docx?dl=0&rlkey=f55vy6z360xcmorii2xv3...

86780

Flow Chart to visualize the decision-making process of the processor for Voltage and Frequency. Obviously the evaluation is continuously looping and the V and F flows are intertwined, ie you can imagine the Frequency flow continues into the voltage flow. Not exactly so, but close enough for comprehension.






83,548 Views
26 REPLIES 26

Korth
Level 14
Less practical for most people than the Q-CODE and Q-LED indicators. I've never tried running MCE checks on overclocked processors, lol, unless they were hidden under the hood on other bench/stress softwares.
"All opinions are not equal. Some are a very great deal more robust, sophisticated and well supported in logic and argument than others." - Douglas Adams

[/Korth]

Feklar
Level 8
This should be sticky'd.

i9 12900k + Asus Maximus Z690 Apex + EVGA RTX 3090 Ti FTW3 ULTRA
G.SKILL Trident Z5 RGB Series 32GB (2 x 16GB) DDR5 6000l XMP 3.0 Desktop Memory Model F5-6000U4040E16GX2-TZ5RK+ Samsung 870 Pro SSD, EVO 1TB, EVO 2TB
EVGA SuperNOVA 1000 T2 Power Supply + Fractal Meshify 2 XL case
Ek Velocity 2 CPU block, Ek GPU block
Koolance Fittings and QDC's + Mo-Ra 3 Pro 4x180 Radiator
LG 38GL950G Monitor +
Windows 10 Pro

Arne_Saknussemm
Level 40

HiVizMan
Level 40
Thank you Shammy
To help us help you - please provide as much information about your system and the problem as possible.

Shamino
Moderator
This also raises the question, is 'stock performance' absolute in today's context? with TVB and XFR on AMD, frequency depending on power and power on temps, tests carried out on an air cooler will differ from tests carried out on custom water, without even bringing in the question of what ambient temp to test at. One can only test with a typical cooler, which seems to be an AIO on these platforms.

This is a great point. Reviewers and users need to be very aware of these technologies in the current CPUs and supporting boards. They add an almost infinite number of possible "stock" or "baseline" performance norms. These variables will need to be dealt with when making comparisons. Reviewers will need to be very detailed if they plan on presenting any results that are "apples to apple."

One thing I DO very much like about TVB implementation on the new Asus boards is the rewards built into power consumption and performance through smart component /cooling selection. Without even overclocking, performance and efficiency can be dynamically gained (or lost) based on cooling choices.

This is a great thread Shamino. It draws attention to the fact that new motherboards need to be reviewed in a entirely new light. Routine comparisons based on the old "tried and true" review methods need to be changed.... and in some cases farily drastically.

I must say that after getting my Hero XI up and running, and having some time to really work with it, it has become clear there is a lot more going on than looking only at VRM design, VRM temps and Vdroop measurements at associated voltages and clock speeds. I've achieved some pretty stellar performance at surprisingly low voltages and its interesting to see those parameters change dynamically by adjusting cooling effectiveness. Asus appears to have done a very nice implementation of MCE (their version of it) and TVB.


Shamino wrote:
This also raises the question, is 'stock performance' absolute in today's context? with TVB and XFR on AMD, frequency depending on power and power on temps, tests carried out on an air cooler will differ from tests carried out on custom water, without even bringing in the question of what ambient temp to test at. One can only test with a typical cooler, which seems to be an AIO on these platforms.

Luck100
Level 7
Shamino wrote:
MCE
If you played with Cache OC, you see that it is very intolerant of any undershoots. Straightaway you would hardlock or BSOD. You can even test it at default. Since it shares the same rail as core, set core ratio to something really low like 40x. Set min and max cache ratio to 43x and set a manual voltage like 1.15v. Run a heavy load like prime 95 non AVX. Dynamically slowly reduce the voltage 5mv at a time. You will find the VMIN this way. Once you find the VMIN under continuous load, stop prime95. If it doesn’t hang, run it again, back and forth between running and stopping. Even try booting straight from bios with that VMIN. You will see that this VMIN requires a guardband for transient load changes, meaning you will need 5mv+++ more. You will observe bigger guardbands needed at higher cache. Obviously the better the transient response, the guardband requirement is smaller.


How do you adjust voltages or other BIOS settings without rebooting? It sounds like you have a tool to click and change it in Windows.

Luck100 wrote:
How do you adjust voltages or other BIOS settings without rebooting? It sounds like you have a tool to click and change it in Windows.


The tool is called Intel XTU, "Intel Extreme Tuning Utility" and can be downloaded from Intel's website.
---

And @Shamino, thanks for this very informative post! The AVX-offset issue is exactly what I have stumbled across and what I couldn't understand. Now it all makes sense and it got me curious, so I did some testing myself. Here are the results.

HW: Z390 Strix-F (SiC639 powerstages, like Hero/Formula/Code), BIOS 0805, i7-9700K
Settings: bclk 101.3, core x51, cache x47, 500KHz VRM Switching Freq, Adaptive Voltage +0.001 offset, Best-Case setting (anything else relevant?)
Measurements: Multimeter across VRM output bulk caps. (Hope that's ok??)
Load: Prime 26.6 Small FFTs, 8 threads, CPU Package 78°C - 84°C max, depending on voltages tested.
Voltages are read under full Prime load, lowest setting where Prime wouldn't complain/crash/bsod/whatever is shown.

LLC5
constant multiplier
BIOS 1.260V, Measured 1.266V, Power 158W (HWInfo)

AVX-1 (XTU), trying to trigger crash
BIOS 1.300V, Measured 1.306V, Power 167W (HWInfo)

--> Guardband 40mV

LLC4
constant multiplier
BIOS 1.300V, Measured 1.268V, Power 163W (HWInfo)

AVX-1 (XTU), trying to trigger crash
BIOS 1.315V, Measured 1.281V, Power 169W (HWInfo)

--> Guardband 13mV (!)


Conclusions:
- The difference between LLC4 and LLC5 is quite significant. Much bigger than a briefly tested LLC5 vs. LLC6.
- LLC5 still overvolts slightly under load, which is not reflected in HWInfo-readings at all. LLC4 and LLC5 both appear to undervolt, but, according to my measurements, this only holds true for LLC4. (LLC6 is actually shown correctly as overvolting.)
- VRM switching frequency does indeed seem to help, at least in my case, and I generally don't understand why Asus keeps sticking to a default of 300KHz on these simpler boards. I've also measured VRM temps (at backside of the board) and the difference is really within margin of error for my measuring equipment, maybe 1-2°C. I see no reason not to just set 500 and forget about it. By the way, these SiC639 powerstages are actually high-speed ones (up to 1.5 MHz) and the datasheet doesn't even mention values as low as 300KHz, starting at 500KHz for their charts...
- I found the best way to trigger a guardband-violation is setting AVX offset 1, then, during Prime load, just repeatedly open Win 10 Startmenu and Settings dialog, just wildly clicking through the different subpages of Settings, especially the Apps page. It's a sure and quick crash/BSOD for the Prime load if voltage is too low. For me, switching multipliers or decreasing boost limits didn't trigger crashes at all.
- Most (all?) of these so-called overclocking tutorials/videos fail to mention this topic at all. They just blindly set an AVX offset, because, you know, everyone knows AVX isn't stable at the same freq/voltages, lol.
- Finally, for overclocks needing AVX-offset I'm going to use LLC4 from now on, without offset probably still LLC5.

Open Questions:
- In what significance is the guardband dependant on the absolute voltage? I mean, would a test at x45 bclk 100 with much lower voltages yield a different result?
- It's a bit strange that my LLC4 AVX-1 setting measures less voltage, yet is using more power according to HWInfo ?
- I've tried testing 2 instances of Prime, 26.6 and 29.6 AVX in parallel, but as soon as AVX threads are present, the offset is suddenly applied to all cores. Why? Is that intended behaviour?? It's strange, because with the Startmenu/Settings test this doesn't happen. You can clearly spot single cores running at -1 in HWInfo, but the other cores keep running the higher multiplier for the Prime load just fine... (?)
- Another way of dealing with this whole affair might be to just forget about AVX-offset and employ appropriate max power limits. You know, to keep AVX loads in check while still maintaining max boost clocks for normal load. Something worth investigating further.
- I'm now very curious how other boards would perform in a test like this. Most interested in the Z390 Gene with its IR3555 powerstages, but also the Gigabyte boards with their doubler-equipped "bigger" VRM. Anyone got some data?
- Back to the Gene, IR3555. Is the body-breaking feature of these powerstages actually implemented? I'd expect this to help in scenarios like these, no?

Thanks for reading, discussion welcome! 🙂

vvoid wrote:
The tool is called Intel XTU, "Intel Extreme Tuning Utility" and can be downloaded from Intel's website.)

I have the Asus version of Intel XTU that comes with the Maximus XI Hero. It doesn't have the ability to change voltages. It can only change multipliers, power limits, and a few other things. But no voltages. I wonder did Asus gimp this version of XTU, or does my motherboard not support voltage control from windows?

vvoid wrote:
I've tried testing 2 instances of Prime, 26.6 and 29.6 AVX in parallel, but as soon as AVX threads are present, the offset is suddenly applied to all cores. Why?

Yes, this is how all the dynamic clocks work. At any given instant in time, all the cores must run at the same multiplier. It can sometimes look otherwise in HWinfo, but that's because it's not actually measuring all the core multipliers at the same instant of time.

vvoid wrote:
Another way of dealing with this whole affair might be to just forget about AVX-offset and employ appropriate max power limits. You know, to keep AVX loads in check while still maintaining max boost clocks for normal load. Something worth investigating further.

This is exactly what I've chosen to do. Keep AVX offset at 0 and set PL1/PL2 to ensure I don't exceed 80C under any sustained or transient load. That means I'll downclock only under extreme loads like P95 with AVX, while games that use AVX don't pull nearly enough power to hit the power limits. This works nicely in combination with offset or adaptive voltage, as you'll also get a VID and Vcore reduction when downclocking (which helps bring you back under the power limit).