cancel
Showing results for 
Search instead for 
Did you mean: 

Storming the Power Gates with Z890 Apex and 285K: Prepare to Overclock like never before.

Falkentyne
Level 12

Storming The Gates: Prepare to overclock like never before.

Thank you to Shamino and Asus ROG team for allowing me to test this new hardware and write this guide for the OC community, as the rules have now truly changed (again!).

Hardware used: Asus Z890 White Apex

285k QS.

Arrow Lake is Intel’s first consumer tile based chip (basically an evolution of Meteor Lake, but it shares more in common with Lunar Lake), and consists of three “tiles” or “dielets” for Compute (primary cores), graphics and SOC.

DLVR is now fully implemented, so overclocking is going to function differently than what tweakers are used to.  The P cores are grouped around the E cores in segments, in such a way that the first two P cores and the last two P cores are at the edges, and are then surrounded by E-cores, with four more P cores around the middle.

MCE functions similarly to later bioses for Raptor.  OCers will probably want to enable Asus advanced OC Profile settings from the get-go.

There is currently a voltage limit imposed on each DLVR rail.  This limit is roughly 200mv above the stock VID of the processor at max turbo.  You can override this, but this override will clip the processor to 400 mhz unless the temperature is 10C or *LOWER*.  So this override is designed for subzero and XOC.

Note that in “Power Gate” mode, this override does not apply, but power gate mode is leaving on the first microcode past 110, so you had better get used to not having it.

Typical chips should be able to reach x54P / x48E / x40R in DLVR mode easily, without needing to do any other work.

In power gate mode, you should be able to reach x55 / x50 / x41-x42 on most chips except the very worst. 

Good chips can reach x57 / x52 / x42 in gate mode.  Refer to Sugi’s overclock as he has an excellent chip sample.  X58 is extremely difficult to do even if you delid.

The VCCIA rail (Intel architecture) is for the main cores for SVID and goes up to 1.72v (1.77v).  VCCSA boot voltage also shares this limit.  Obviously this limit is designed for extreme overclocking.

The “VRM Voltage” (which was known as Actual VRM Vcore Voltage on raptor) is what the DLVR sources the DLVR rails from.  This is based on the highest VID available at the current ratio and is on average about 250mv higher than the DLVR rail values.

Normally, you can expect 1.35v for light loads and 1.15v for heavy AVX loads for the P cores at higher ratios, (up to x54) and 1.35 for light E core and 1.20v for heavy E core loads.

If you try to go beyond these voltages, you are very quickly going to either reach thermal throttle or crash.  In regular DLVR mode, you will have a GREAT deal of difficulty exceeding x54 ratio on the P-cores, because the DLVR will simply run too hot, at the voltage the P cores will try to request from the initial VRM (which GREATLY heats up the DLVR).  This can be bypassed, which will be discussed below.  The E-cores can easily reach x49 without too much work.

You can control each pcore individually now, but E-cores can only be controlled in clusters of 4 (1 cluster=1 voltage rail, unlike the P-cores, which can be separated).

Clock ratio steps are 0.16x (16.67 mhz) rather than 100mhz full step now, so you can get better frequency tuning without resorting to BCLK.  If you enable ratio extension in Tweaker’s Paradise, this changes to 33.3 mhz.  In “By core usage” menu, the “Active” core count ratios are still in the traditional steps, but in the specific core menu, you can specify the ratio in decimal values.

CCF AutOGV is for ring power saving.  IA, SA and GT CEP still exists and will throttle the processor when current is abnormally high.  These are of course disabled if you use the Asus Advanced OC profile.

Please disable fast Vmode when disabling CEP.  Fast Vmode did not function (as far as I know) on Raptor Lake.

You can extend the lower working temperature limit of the processor by using CPU Band Gap Reference setting. This normally uses an internally regulated  fixed voltage for the BGref IP Blocks, but in bypass mode, it will use SA voltage instead.

FLL mode is unchanged.

Core minimum ratio limits how low the cores can go.  This may seem redundant, but when PLL voltages and BGref are overvolted, the “working frequencies” will shift upwards, which can stop low ratios from working at all.  You can also use this as a normie to set the low power frequencies of the processor while idle.

Ring voltage is now separate from Pcore voltage.  But ring is usually limited to around x42 (similar to what we saw with Rocket and Alder).   Pushing the ring may help a bit with the already very high latency issue.

The max ring limit has to do with the new segmentation of the P core locations.  Because there is a group of P cores in the middle of the E-cores, ring max tends to be limited to around 4200 mhz.  On SKU’s where this middle group is not present (example: 245k) you can get up to 4.4 ghz.  So you have a delta penalty of 200 mhz for the middle group of p-cores enabled).

VCCSA (System Agent) functions differently than before.  NGU Voltage and Memory Subsystem (MemSS) controls the SVID for the System Agent now.  They share the same rail, so the higher of the two voltages gets used.  VCCSA controls many sections of the uncore region, and you are going to need more of it in order to push D2D, NGU and memory frequencies (1.35v+).

VNNAON (always on voltage) is a low power voltage that is set to 0.77v by default.  This is muxed internally between itself and the core and ring DLVR’s to help save power.  Increasing this rail helps greatly for D2D (Die-To-Die) and extreme OC at cold temperatures.

Memory Controller Voltage tracks VDD (DRAM) because they both need to communicate over the bus (remember this is not raptor anymore).  This voltage is also the source for three other rails.

VDDQ voltage Is an internal rail that is running in “bypass mode” so it is linked to Memory controller voltage—which is its source voltage. (similar to how the CPU 3 main rails are linked to “Actual VRM Vcore Voltage” in bypass mode (bypassing the DLVR).

VCCIO Gated and VCC Clock are 0.85v and should be left alone, as they are related to the memory controller voltage also.

Memory MC voltage calculation “base” and VDD voltage calculation “base” are voltage numbers used for internal calculations, and they represent memory controller and DRAM VDD voltages, but they do not need to be at precisely the same voltages as them.

VCCIO defaults to 1.25v, nothing we should care about (do NOT treat it like skylake, you will fail).

The SOC 1.8v and CPU 1.8v rails are useful for XOC on cold.

 

As before, this is Rocket Lake 2.0 (Tile based system), Or you can call it Starship Lake.  And the NGU ratio changes the frequency of the interconnect to the memory controller, and can go as high as 26x on regular settings.  You can push this up to x35 or higher with 1.25v-1.35v VCCSA.

D2D Ratio is the bus frequency that chains the different tiles together, from the compute tiles to the SOC tiles.  The SOC tiles house the GT and memory controller.  The d2D ratio can NOT be lower than the minimum ring ratio, this ratio is 15x.

You can get up to 35x on ambient cooling, but if you want more, you need to increase VNNAON  and VCCSA.  With 1.0v and 1.35v, you may be able to get x40 on D2D and x35 on NGU.

Increasing these ratios can help substantially with getting that latency down, and can nudge overall bandwidth a bit.  The ring ratio is also very relevant here.

(continued)

1,553 Views
2 REPLIES 2

Falkentyne
Level 12

 

All of these are based on the “SOC base clock”.

There are two types: Synch and Async modes.  In Sync mode, the CPU and SOC BCLK come from one clock source.  When you overclock the master BCLK in this mode, all of the blocks will get overclocked at the same time.  In Async mode, the CPU BCLK only raises the clocks in the compute tile, cores and ring, while memory NPU, GT, etc, will stay at 100 mhz.  Likewise, raising the SOC BCLK will then raise the memory, NPU and GT BCLK’s, while the cores and ring will not be affected.

PCI BCLK affects PCIE and DMI.  Most people will probably not want to mess around with these.

Memory topology is different on Arrow Lake.  There are two memory controllers,  and controller #0 is connected to the top half of the DRAM while controller #1 is connected to the bottom half.  The first channel (A) is connected to the two slots farthest away from processor while the second (B) channel is connected to the two slots closest to the processor.

Dimm Flex functions the same as in Z790, see the Z790 world record pac-man overclocking guide for that.

There is a new feature called “Dimm fit”.  This attempts to fine-tune signaling (not frequency or timings, best way to think of this is skews) automatically for your memory (based on your currently in-use OC profile, so you must tune first!) and will take a few hours to work.  Best to do this overnight, and it’s best to run this at your max (or close to it!) stable memory overclock settings.  You can save the “Fit” profile (known as the “Fit Store) once it’s done and use it as a base for your own overclock.  There are three stores available and they *WILL* persist through a BIOS update!

For example: if you were running a 2x16 7600 mhz memOC , after running fit, maybe you will be able to reach a new 7733 mhz setting by using the fit store.  You can then try to “re-fit” that setting and it “may “ be able to allow you to tune 7800.

These parameters affect overclocking margins only.  It does not tune anything performance related, only low level signaling.

DRAM ratios are in 33 mhz steps, proportional to SOC BCLK.

End users can expect existing Raptor Lake kits to overclock about 200 mhz better on ARL, but the latency will be significantly worse, due to the tile based arch.

CK RAM tuning is available—Sugi will know more about this as I don’t have this memory.  CK dimms are “Dimms with clock buffer onboard.  You can tune this in DRAM Timing Control-CKD configuration.

Once you exceed the PLL buffer capability, you will need to switch the PLL mode in order to boot.

 

Current builds of CPU-Z show correct memory ratios.  If yours show only half, please update your software.  If your HWinfo64 doesn’t show voltage DLVR rails, update your software.

A few people have gotten 9000 1:2 stable, like Sugi0lover.  No one tunes memory better than Sugi0lover.  I think der8auer may have reached 9200.  But good luck stabilizing it.

1:2 9000 is about the most you can expect to reach on regular dimms without putting in some extra work. 

There are now three individual voltage rails to control, which are handled by DLVR, P core, E core and

Ring.  In DLVR mode, the rails have no vdroop, however they still must be fed externally through the VRM’s, and the external regulator of course does have vdroop.

Power gate mode is more like the traditional way of overclocking, where here all the rails are fed directly through one source, from the VRM.  Usually, in PG mode, you can expect to gain about +100 mhz on the Pcore clocks, compared to DLVR mode, but testing has shown that this is actually due to excessive heat generated on the DLVR, which hurts stability.  Traditional PG mode is hated by Intel and

is going hastalavista, unless your core temps are 10C or lower (e.g. subzero OC), and this is also the limit as to where the rails no longer have a 200mv delta above their base voltage.

Power Gate mode will set all the rails to the same voltage, similar to what you saw with rocket lake, and they will all take the VRM’s vdroop.  The DLVR is bypassed, so it won’t be generating heat, however exceeding x55 / x50 is going to be quite difficult without a custom loop or a delid.  Please note that in power gate mode (while it’s available), the VID’s are completely inaccurate, so you should completely ignore any heresay about “high abnormal power usage”, since CPU Package Power is based on VID * SVID IOUT (amps).  So please ignore those silly videocardz rumors about 375W processors. 

Update to the newest HWInfo64 betas to see the DLVR rails.  Note only one master rail is shown, due to the huge number of Analog to digital converters needed to show them all.  This voltage is not fully accurate as it uses the ground plane from the vcore input, so you can think of this as “socket sense” voltage.  Thus trying to use LLC8 will show a large vrise on the DLVR.

In order to stabilize DLVR mode without hurting frequency, compared to PG mode which is going byebye on ambient (this will be removed if it hasn’t already been removed, after microcode 110), we can use a trick to starve the DLVR, which will reduce heat.

At higher core ratios, the DLVR requests a lot of voltage from the VRM.   It is precisely this heat generated by the DLVR, which is what will limit overclocks (and why power gate mode tends to give about 100 mhz better on the P-cores).

What you can do is “starve” the DLVR.  This works because the DLVR rails do not have vdroop on them.

However the DLVR rails are still sourced from the primary voltage rail (VRM Vcore source).  And this rail does have the standard droop everyone is accustomed to.

At higher ratios, even setting something like 1.2v on the pcore rail, Pcode  may request 1.6v (or higher) from the VRM, which creates an astonishing amount of heat.  So what we do is put a limit on this request.

We can do this by either using our old friend, Voltage Suspension—which now works differently than it did previously, or by using direct VRM Voltage and setting a maximum source voltage.

Obviously, a source voltage of 1.45v is going to generate much less heat than a source of 1.6v (remember that this voltage is then fed into the DLVR, but is affected by vdroop).

So one way to do this is by setting something like 1.45v actual VRM voltage, and LLC6.

This will prevent the target CPU VID from exceeding 1.45v, and thus prevent the DLVR from requesting more than 1.45v.

The one thing you have to watch out for is vdroop.  Normally, the P/E/Ring rails in DLVR mode do not get affected by vdroop, however they *will* be affected if the source voltage is too low and the vdroop on the VRM side causes the output voltage to drop below the P/E/R rail requests.  It is up to the end user to find what settings work best and what loadlines are acceptable.  Using LLC8 is not recommended.

The second method is by using voltage suspension.  Voltage suspension on Raptor Lake could cause transient response instabilities (similar to using a too aggressive loadline calibration) under AVX workloads, but now on Z890, this is completely different.  You can set a maximum static ceiling cap to limit what the DLVR can get from the VRM, and thus starve the DLVR of this voltage.  It may request 1.64v and end up getting only 1.45v, which greatly reduces heat from the DLVR.  You still have to be aware of VRM vdroop, so you should set an appropriate amount of LLC.  LLC6 is a good target.

By starving the VRM, you should be able to get similar clocks in DLVR mode that you would have gotten using power gate mode, and you may even be able to do this with lower temps.  Stability success is not guaranteed and you will have to do some boot to boot experimenting, especially if you are right at the edge.

X55 / x49 / x40 should be easy to achieve with most chips.  Exceeding this is going to take cooling, work and good silicon.

Without starving the DLVR, the most you can normally reach is 100 mhz lower—the heat generated from the Pcode requesting from the VRM will cause instability.

There is a new PLL trim setting, called PLL ‘ref’ offset, which operates on top of the main PLL trim, which can be applied to core, e-core (Atom L2), ring, memory controller, SOC system agent (this is new) and CPU System agent.  Both the trim and offsets are limited to hex 0F (+15 decimal) by pcode.  This is also the same for the PLL trim on raptor lake—a core PLL higher than 1.125v will not take effect at all, and the last known previous PLL will be used.  In ARL, instead of entering in the voltage, you enter the offset, from 0 to 15.

OCTVB has been changed from raptor lake.  You can now choose “per CCP module” or “Per P-core group”.  Per P-core group is the legacy method that was used on Raptor Lake.  This may be of interest for those trying to use power gate DLVR bypass mode, until that mode goes away.  Per p-core group has only P-cores doing OCTVB and downbinning based on active # of P-cores.

Per CCP –means Per Pcore/Ecore cluster module.  You can configure and customize each Pcore and each E-core cluster (remember the Ecores are in cluster groups) for temperature “A” and “B”.

V/F point offsets now come in per P-core and E-core mode so you can configure each core separately. 

Note that as soon as you set a manual overclocking ratio, the V/F point grows by one new added point, so if you want to manipulate a point after “P0”, you need to reboot in order to do this.

So to summarize some basics:

Since Arrow Lake has implemented DLVR:

Each P-core has both its own PLL and it’s own VR.

Each E-core “Cluster” has both its own PLL and it’s own VR.

The Ring has its own PLL and it’s own VR.

The input voltage comes from the main IA external VRM vcore rail.

The E-cores do not have a separate rail for the L2 cache.  Instead this voltage is determined based on the E-core cluster’s VR or from VNNAON, which depends on the L2 SRAM’s VF point.  This is done dynamically.

The Ring’s L3 also derives its power from either its own VR, or from VNNAON.

An example of how this works, from Shamino (Asus R&D):

“Maximum vmin for all currently active P Cores is 1.20v. Firmware determines from current workload that the E Cores do not require maximum frequency. The E Core frequencies is then determined based on each’s vmin capped at 1.20v. Like-wise for Ring. Frequencies is then updated to reflect the determined set points and the appropriate VID is requested from the external VR. If frequencies are overridden then this will not be the case.

Therefore the external IA Core VID requested is roughly equal to max of each DLVR’s VID + (max of summation of each LVR’s Loadline*each current as a function of each ICCMAX (Dynamic and Leakge) + CEP Adjustments + any temperature compensation) * MB VR’s ACLL as a function of the external ICCMAX, capped to the external VR’s VID Limit + any additional Guardbands.“

Rails that affect RAM OC:

VDD2 (MC Voltage), this powers the digital portion of the DDRIO Phy.

ALVR (analog linear VR VDDQ, sourced by VDD2)

VCCSA Voltage, this powers the analog portion of DDRIO Phy.

Internal LVR VCCIO Gated Voltage

Internal LVR VCC CLK Voltage.

DTS is powered by the 1.8v CPU rail and overvoltage of this rail will skew DTS temp readings.

The digital portion of DTS is powered by VNNAON.

For the SOC IP (SOC Tile), the main power rail is VCCSA.  The source of power for the SOC’s SRAM is going to either be VNNAON or VCCSA, depending on the vmin of the current V/F point of the SOC SRAM (this was discussed with Ecore L2 above).

E-cores, if present on the SOC, would be powered by VCCSA.  Currently this is not implemented on desktop.

VCCIO: Static 1.25v. 

VNNAON: Static 0.77v (powers low power IP logic)

VCCIA: Master Vcore source rail from VRM, used dynamically as SVID Input Voltage.

VCCSA: Dynamic SVID

VCCGT: Dynamic SVID

VDD2: Memory Controller Voltage (static: 1.10v).

1.8v: Static CPU, SOC, DDR, Quiet, PCH.

VCCIA is the SOURCE power rail for:

  1.  Pcore DLVR’s (compute die)
  2. Ring/LLC DLVR’s (compute die)
  3. LLC (DLVR dynamically muxed with VNNAON, compute die)
  4. E core / Ecore L2 DLVR’s (compute die)
  5. Ecore L2 DLVR dynamically muxed with VNNAON (compute die)
  6. CPU Die CEP throttler (compute die)

VNNAON is the SOURCE power rail for:

  1.  Various low frequency logic (compute+Soc die)
  2. Die management unit (compute die)
  3. Fuses (compute+soc die)
  4. D2D (die to die interconnect), compute+soc die
  5. Various PLL’s during low power states (compute die)
  6. DTS digital  (compute die)
  7. Bandgap digital reference (compute die)
  8. Video Processing Unit SRAM during low power states (SOC die)
  9. Media SRAM during low power states (SOC Die)
  10. IPU (image processor) SRAM during low power states (SOC die)
  11. Display engine SRAM during low power states (SOC Die)
  12. E-core L2 of SOC during low power states (SOC die)
  13. PCIE Gen 4 USB and IO (SOC die)
  14. High speed IP’s like PCIE Gen 5, DMA (SOC die)

1.8v CPU is the source power rail for:

  1.   various PLL’s during high power states (compute)
  2.   DTS analog (compute)
  3.   Fuses (compute)
  4.   Bandgap analog reference (compute).

VCC GT is the source power rail for:

  1.  GT CEP engine (Current excursion protection throttle): GT Die
  2. Graphics Engine (GT die).

VCCIO is the source power rail for:

  1.  GT PLL’s (GT die)
  2. GT DTS (GT die)
  3. Display PHY IO  (GT die)
  4. PCIE PHY IO (SOC Die)
  5. USB3 PHY IO (SOC die)
  6. PCIE (SOC Die)

VCCSA is the source power rail for:

  1.  SA CEP engine (SOC die)
  2. Memory (SOC die)
  3. NPU (Neural Processing Unit), SOC die
  4. VPU (Video Processing Unit), SOC die
  5. Media Engine (SOC die)
  6. IPU (Image Processing Unit), SOC die
  7. Display (SOC die)
  8. E-cores if present on SOC (this is not on consumer, it may be on client (laptops?).
  9. Various PLL’s (SOC die)

VDD2 is the source power rail for:

  1.  DDR PHY IO (SOC die)
  2. DDR PHY TX VDDQ via ALVR (Analog LVR): SOC Die

1.8v DDR is the source power rail for: DDR PHY IO “High” voltage (SOC die).

1.8v Quiet is the source power rail for:

  1. 1.  IPU PHY IO (SOC die)
  2. 2.  Analog  PLL’s (SOC Die)
  3. 3.  SOC DTS analog (SOC die)
  4. 4.  CNVI PHY IO (SOC die).

1.8v SOC is the source power rail for:

  1. 1.  USB2 PHY IO (SOC die)
  2. 2.  GPIO’s (SOC)

Shamino’s explanation of SA VID:

Similar to IA Core,  the external SA VID requested is roughly equal to max of (SOC E Core VID, SOC E Core VID , IPU (Image processing unit) VID , VPU (Visual processing unit) VID , Media VID, Memory Controller for (DDR and Neural Processor) VID,  NPU VID, FCLK VID, Display VID, and rest of System Agents’ VID) + summation (each’s current as a function of each’s ICCMAX) + CEP Adjustments + max of each LVR’s Loadline*each current as a function of each ICCMAX (Dynamic and Leakge)  + any temperature compensation * MBVR’s ACLL as a function of the external ICCMAX, capped to the external VR’s VID Limit + any additional Guardbands.”

Due to the muxing between VNNAON and IA/SA, SVID behavior is affected as it ramps down to the target, it has to be split to ramp down to VNNAON  (0.77v) level, followed by internal mux switch, and then from there to the target.  If the current VID is *below* VNNAON level, then it has to ramp up.  Thus you can see that the processor must know the actual VNNAON supply voltage level.  The BIOS handles this in BIOS, but *NOT* when you are doing runtime adjustments.

D2D cannot be set below minimum ring, and this is x15x.

Maximum DLVR that can be set is 1.65v, and there is a “process VMAX” setting in the BIOS which overrides the 200mv limit over base VID, but if the CPU is over 10C, it will be throttled.

IA Min Ratio is designed for LN2 and will raise the minimum multiplier range, since low frequency does not work well with high voltage on this platform.

For BCLK: there are three sets: CPU, SOC and PEG/DMI BCLK.

------------------------------

Arrow Lake has about a 5% IPC improvement on the P cores and about 25-30% on the E-cores.  This  ends up in very nice CPU-Z and Cinebench benchmark results, however the new tile system with TSMC has a massive latency penalty on memory.  More massive, in fact, than even rocket lake, which is enough to cause performance regressions even worse than what we saw with rocket lake vs comet lake.  Hyperthreading enabled would have prevented any of this, but the amount of heat HT would have caused on this platform would have prevented even the already pedestrian clock speeds from being stable.  You can expect about a 20ns latency penalty with the same RAM on ARL as compared to RPL.

You also have to watch the values for some of the memory timings.    The new values are similar to what Sapphire Rapids uses, trying to use values that are migrated from Raptor Lake could cause a more than 50% reduction in write and copy.  In addition, TWR must be 48 or above, and this must be set by “TWRPRE”.  Not by the old TWR (Raptor also is like this).    You can try tWRPRE 48 and tRTP 18.  TWRPRE 48 is the jedec minimum.

1T is actually possible now with the right setup.  But good luck stabilizing it.

As far as I know, tRDRD_dg and tWRWR_dg must be 8 otherwise you will lost massive copy/write.

It’s important to set TWRWR_SG to 32 and tccd_l_wr to 32.  They may be linked to each other.

Sugi managed gear 2 at 9000 C38 x57P, x52E, x43R, NGU 35, D2D 30, which scores extremely well in the Stockfish benchmark.  I believe 9200 is the absolute maximum possible on Gear 2 mode, sugi0lover will have more information about this.

I tried the preset Hynix 8600 mhz profile on the gskill 2x24 kits, but at x56 / x50 / x39, I was still slower (by about 3 million nodes per second, or about maybe 2-3%) than a 5.4 / 4.4 / 4.5 ghz 14900K with 8000 tight tuned memory.  Yet Cinebench R23 scored 45,800 (a 5.7 / 4.6 / 4.5 ghz 14900K scores about 42,950).

Seby managed to top 14900K Stockfish 5.8/4.6/4.5 bench with a 9000 mhz tuned 285k with very aggressive timings, by about 15 million nodes/second.  Something I can’t come close to even at “prebuilt profile Hynix” 8600.

You can try the ucode switcher in tweaker’s paradise in the BIOS to see if older ucode gives you any extra performance!  Be sure to report your results.

Pijulin
Level 8

Thank you for your introduction. What are the settings in the motherboard for the description of "The second method is by using voltage suspension." and "There is a new PLL trim setting, called PLL 'ref' offset, which operates on top of the main PLL trim" in DLVR overclocking? I am using Z890 Apex (BIOS 1001 Beta).

---------------------- Edit ----------------------

I found Core Voltage Suspension in DIGI+VRM.
I turned this option on:
Voltage Floor Mode stays Static and Voltage Floor stays Auto;
Voltage Ceiling Mode stays Static and Voltage Ceiling is set to 1.45;
Also set the LLC to Level 6.