cancel
Showing results for 
Search instead for 
Did you mean: 
Silent_Scone
Super Moderator

The reason for granular control

Temperature and electrical circuits have always been intrinsically related. PC components are certainly no exception to the rule, in fact, we could say that they are a prime example. When temperatures are increased the electrical resistance of materials tends to do the same, leading to variations in signalling that can potentially become overclocker enemy numero uno. When it comes to DRAM, motherboard engineers already employ their experience to mitigate the effects of such things by design and through vigorous training and read/write operations conducted at POST, but it can be difficult to account for these fluctuations in real-time. Occasionally, temperature can be the direct cause of system instability, resulting in a frustrating debugging exercise. If the behaviour isn't the direct result of an immediate change we've made to a particular timing or voltage, it can be difficult to know where to look first. After all, we haven't changed any settings since establishing our overclock, so what's different?

There are a large number of reasons why an overclock can suddenly become unstable, not all of which are within the scope of this particular overview. Electrical engineers we are not, so let's focus on how we can combat the impact of such things on the frontlines!

 

Retention Retention Retention

Since the close integration of the power management circuit on DDR5 modules, elevated DIMM temperatures during overclocking have become more of a concern when in search of stability or hitting higher frequencies, requiring careful consideration of thermal management. To explain why temperature is important we will use the DRAM refresh timings as a prime example. At its core, our tREFI timing ensures data integrity over time and controls the interval that maintains cell retention. It does this by scheduling DRAM “cells” to be refreshed periodically to counteract a phenomenon known as charge leakage. This cycle is critical in keeping the memory cells “charged” as these are what contain our precious data. As DRAM is volatile storage, it means these cells require a constant power source to keep this information stored.

With tREFI, a higher value is better. This dictates the length of time before the next refresh command. The reason we may wish to set a higher value for performance is because the memory bank becomes inaccessible whilst the refresh process takes place, which incurs a performance penalty.

tRFC specifies the time that must elapse between the precharge command (PRE) of one row and the subsequent activate command (ACT) of another row within the same memory bank. A lower value is better for performance, as the row is inactive for less time. However, if our tRFC interval is not set so that our charge pumps are given sufficient recovery time, the cells may not be sufficiently refreshed. It's also worth noting that larger capacity DIMMs require a longer tRFC interval, as there are a large number of cells. If you compare board-controlled or XMP timings between 16GB and 24GB modules, you'll notice some obvious changes to the refresh intervals.

 

Charge leakage

This is a natural phenomenon where stored electrical charge loses capacitance.  As DIMM temperature increases, so does the charge leakage rate of the cells containing the data. Problems arise when our accelerated charge dissipation is fast enough that the data stored within the cells is no longer valid, as we haven't spaced our refresh interval sufficiently enough.

 

What does this have to do with DIMM Flex?

Being that both tREFI and tRFC are temperature-sensitive, they may require timing adjustment to counteract the increased rate of leakage, as the defined refresh interval may no longer be frequent enough to ensure that all cells are refreshed before data degradation occurs.

Catching instability in stress tests or conventional day-to-day workloads caused by our tREFI interval can be difficult. This is in part because the total refresh operation is done in several smaller operations, meaning not all cells are refreshed at the same time. To make matters even more complicated, each cell will have a slightly different leakage rate. Much like other memory timings, the interval we can run depends on a number of factors, including the VDIMM, density of the module, memory IC type and the applied memory frequency to name but a few (you're probably starting to see why we chose refresh as our subject matter!) Of course, these timings are not the only ones impacted by temperature. Elevated temperatures impact signal propagation speed, affecting various other timing window validity. Being late is usually a bad thing, and DRAM is no different!

What DIMM Flex enables us to do is customise DRAM timings and frequency based on onboard memory temperature monitoring, ensuring optimal performance while maintaining stability under varying thermal conditions.

 

Temperature-dependent customization.

 DIMM Flex offers three customizable levels to adjust DRAM settings according to temperature ranges:

  • Level 1 (Low Temperature):Best performance.
  • Level 2 (Medium Temperature):Slightly compromised performance.
  • Level 3 (High Temperature):Best stability lower performance

Users can set the temperature thresholds that define each level and configure the associated timing and frequency values. DRAM refresh-related timings are a key example of why DIMM Flex is useful. By customizing tREFI, tRFC and other timings based on temperature ranges, users can ensure that memory settings are appropriately adjusted on the fly to counteract the impacts of increased temperatures on DRAM.

 

231120175109.jpg

 Example 

On the ROG APEX Encore, ASUS engineers have provided presets for SK Hynix based memory kits at 8000MT and above. At level1, we can see our set intervals which are based on either our board derived XMP timings or ones we have input manually. We can see the designated tREFI interval after our 45c > threshold is 65535, and 55c > is 32767.

ASUS engineers and Gurus Shamino and SafeDisk suggest a starting point threshold of 45°C and the level 2 threshold at 55°C. Users, familiar with worst-case DRAM scenarios from prior testing, can further optimize their settings using a typical timing-temperature gradient.

231120180829.jpg

Level DRAM Timing Control

Underneath Level 2 & 3 DRAM Timing Control, we can see the timings defined by the profile which will become active after our temperature thresholds are breached.

231120175618.jpg

Silent_Scone_0-1711699003387.jpeg

 

Shamino's Guidelines

Example Gradient:

  • tRFC: 2.666 ticks per degree Celsius
  • tREFI: 5950 ticks per degree Celsius
  • tRP: 0.17647 ticks per degree Celsius
  • trtp: 0.10526 ticks per degree Celsius
  • tfaw: 0.210526 ticks per degree Celsius
  • trcd: 0.105263 ticks per degree Celsius

Runtime Adjustments: Users can scale their timing values based on the known worst-case scenarios. For instance, if tRFC holds at 450 at temperatures above 55°C, the user can set level 1 tRFC to 423 (<45°C), level 2 tRFC to 436 (<55°C), and level 3 tRFC to 450 for the worst-case scenario.

DIMM Flex provides a dynamic solution for optimizing DRAM performance under varying thermal conditions, offering users flexibility and control for an enhanced computing experience.

Supported Motherboards & Memory Kits

https://www.asus.com/microsite/motherboard/dimm-flex-qvl-list/

1 Comment
jackenpacken
Level 8

Y Should first look your mem dies datasheet at 85c to 95 c. In my case trefi can be 3.9 micro seconds at mem temps between 85 to 95. Lets say y run mem at 3600 MHZ that would mean in micro seconds 0,00028. so 3.9/0.00028 is 13928.57 we go 13928. in same data sheet in my die says that y can skip TRFC max 8 times. so 13928x8 = 111428.57, but since we do not want errors i would /3 so 37142.666. In same datasheet TRFC1 for 16gb ram min is 550 but y should but in some extra so that y can make sure ram fully refreshes. I am not sure but i thing if y have TRFC2 it is the time how much more time y need for second dimm to also refresh. usually both dims refresh at the same time, but dim slot 2 has usually longer traces in mobo so it need more time. That's Why TRFC is usually higher than ram dies min value.