cancel
Showing results for 
Search instead for 
Did you mean: 

DRAM Stability - Methodology & Tools

Silent_Scone
Super Moderator

Introduction

Many users enter the world of DRAM overclocking with high expectations, especially after purchasing pre-binned kits that promise extreme speeds and tight timings. The logic seems simple: if a manufacturer rates a kit for 8000 MT/s, it should work, right? Unfortunately, reality can occasionally throw a curveball. Variability in BIOS versions, CPU memory controller quality, and overall system compatibility mean that even a supposedly "guaranteed" kit might not be fully stable in every system.

It's understandable why some users fall for the misconception that overclocking, especially with XMP and EXPO, comes with ironclad guarantees. These profiles are designed to make higher memory speeds more accessible, but their success still depends on various system factors. Enabling these profiles is still overclocking—running not just the DRAM bus but also various subsystems outside of specifications—which introduces an element of unpredictability due to part-to-part variance. 

 

Understanding DRAM Overclocking

Overclocking DRAM, including the use of XMP (Extreme Memory Profile) and EXPO (Extended Profiles for Overclocking), involves running memory modules beyond their officially rated specifications and outside the JEDEC standard. This means that performance, stability, and reliability can vary from system to system due to differences in silicon quality, motherboard, and CPU memory controller capability.

Since memory manufacturers bin their chips for specific speeds and timings, even modules rated for the same speed may not behave identically. Furthermore, overclocking is never guaranteed to be completely stable, as pushing components beyond their stock settings increases the likelihood of errors. It's important to understand that no overclock is 100% stable.

Even a stock system that adheres to JEDEC standards is capable of "flipping a bit". This is why ECC (Error Correction Code) memory exists in the commercial space. Once one acknowledges this, the prospect of testing memory stability for upward of 24 hours or more on a gaming system seems somewhat fruitless. For instance, what if the system throws a violation at 25 hours or 32 hours? In reality, we can't account for all permutations - and we test enough to ensure sufficient stability guardband for our use case!

You can read more regarding EXPO/XMP here: Memory Overclocking - What you may or may not know

 

 

Leave other Domains at Default

It's crucial not to overclock other clock domains while we focus on establishing memory overclocking stability. This approach ensures that, in the event of a stress test failure, we can quickly pinpoint the source of instability. For instance, overclocking cache/uncore domains increases the data passing through the bus. When we overclock the memory, it further amplifies the data transfer, which can lead to errors. By prioritizing memory stability first, we reduce the risk of introducing additional variables and can more easily isolate any issues that arise during testing.

 
 

Stability Testing for DRAM Overclocking

Unstable DRAM can manifest in various ways, ranging from immediate crashes and failed boots to more subtle issues like application errors, data corruption, or random system freezes. In extreme cases, an unstable memory overclock may lead to blue screen errors (BSODs) in Windows or unexpected reboots under heavy loads. Stability testing helps identify these potential problems before they cause disruptions in everyday use.

 

I've enabled XMP/EXPO and now my system won't boot!

It's also important to distinguish between instability that prevents a system from successfully completing POST (Power-On Self Test) and instability that only emerges once the operating system is loaded. Failure to POST when overclocking is often the result of memory training failure. Training is more difficult to pass than operating system-based tests due to its strict pass/fail criteria. During training, the electrical signals between the memory modules and the memory controller are calibrated to stay within a predefined, programmable margin by conducting a comprehensive number of read-and-write tests. If any signal encroaches upon this margin, the process fails outright.

Operating system-based tests, on the other hand, determine stability by verifying whether data can be written and read correctly over time. These tests do allow for some degree of waveform misalignment, as data may still remain valid for the duration of testing. However, this does not necessarily indicate long-term reliability, as slight timing inconsistencies could manifest under different workloads or system conditions, leading to instability over time. This is why it can be important to incorporate more than one stress test in your routine, as broadening the different data patterns used within different suites will help confirm stability.

 

I've enabled XMP/EXPO and now my applications/system is crashing!

Stability issues within the operating system, such as crashes under load or corrupted files, indicate that while the memory settings were sufficient to pass training, they lacked the stability required for sustained operation. This is why stress testing is necessary to ensure reliability beyond simply booting into the OS.

Several testing tools are available to determine the stability of an overclocked memory configuration. Each tool has strengths and limitations, and no single test can provide absolute certainty of stability.

 

Active Cooling

When pushing frequency and stability margins, a fan can be beneficial in keeping DIMM temperatures in check. It's important to remember that, for a gaming system, the DRAM bus won't be loaded nearly as much as running stress tests. Stress tests use synthetic data patterns with the key focus of trying to expose violations over time, whilst real-time applications will only send read-and-write operations that they need to.

A 120mm fan placed over the modules can reduce the reported SPD temperature sensor by 10c. There are some misconceptions about what temperatures are acceptable and the generalisation that under 50c is a golden rule. Whilst keeping below a certain temperature has benefits, the temperature at which the DRAM is stable largely depends on the overclock and how conditional the stability is. You can read more here regarding temperature dependencies and how granular control can help. DIMM Flex - Granular Control

Use HWiNfO to monitor DRAM SPD temperature.

Silent_Scone_0-1739425315162.png

Temperature taken at idle

 

 

Karhu RAM Test

Karhu Ram Test is a paid for tool with a simple-to-use interface. The tool automatically assigns the necessary amount of memory and can be run from within the operating system. Link

Karhu is able to find some violations within 5 minutes that may go undetected for several complete passes of Memtest86+ (45 minutes to over an hour). Making it an invaluable tool in stress testing your memory.

Silent_Scone_0-1739253034190.png

Coverage information

Karhu's FAQ states the following coverage time for error detection rates:

Below are the error detection rates by test duration based on over 100,000 test runs and roughly 8 years worth of non-stop RAM Test*:

 

  • ≤ 1 min: 47.42 %
  • ≤ 5 min: 65.05 %
  • ≤ 10 min: 74.89 %
  • ≤ 30 min: 87.62 %
  • ≤ 1 h: 92.97 %
  • ≤ 3 h: 97.84 %
  • ≤ 6 h: 99.30 %
  • ≤ 12 h: 99.91 %
  • ≤ 24 h: 99.99 %

I've personally been using Karhu for over 5 years, in this time I have never once tested a system for over 12 hours. For a gaming system, 3 to 6 hours should be ample - but it's entirely down to personal preference. There's no fail-safe amount of coverage when overclocking.

https://www.karhusoftware.com/ramtest/

 

Google Stress App Test (Windows)

Google stressapp test via Linux Mint (or another compatible Linux disti) or via Windows is one the best memory
stress test available. Google used this stress test to evaluate memory stability of their servers
- nothing more needs to be said about how valid that makes this as a stress test tool.

  • Install Bash Terminal: https://msdn.microsoft.com/en-gb/commandline/wsl/install_guide
  • Install the Google Stress App test by typing: sudo apt-get install stressapptest
  • Once installed open “Terminal” and type the following: stressapptest -W -s 3600
  • You can add argument "-M" and add the amount of memory you wish to assign to the test (90% of available memory)
  • This will run the stressapp for one hour. The test will log any errors as it runs.

 

Google Stress App Test (Linux Mint) (GSAT)

To bring up system info within Mint Terminal, type: sudo dmidecode type 17 and scroll to the relevant info.

 

HCI Memtest Pro

HCI Memtest Pro is widely adopted as an industry standard by motherboard and memory vendors alike and is a paid for, easy to use tool. There is also a Deluxe version which contains a bootable function for testing outside of the operating system.

Memtest Pro is also quite good at catching certain cache violations on some platforms, making it an invaluable tool for testing overclocks where multiple subdomains are overclocked.

https://hcidesign.com/memtest/

 

Post your experiences or ask for assistance with any of the tools posted here 👍

 

Related Articles & Links

Memory Kits - Overclocking and What You May Not Know

DIMM Flex: Realtime DRAM Optimisatrion

DIMM Fit - Final Fine Tuning

CDK, CUDIMM & Memory Gears - What You Need to Know

Software Links

Karhu Ram Test

Google Stress App Test (GSAT)

HCI Memtest Pro

HWiNFO

 

9800X3D / 6400 CAS 28 / ROG X870 Crosshair / TUF RTX 4090
2,244 Views
2 REPLIES 2

Antraxtacide
Level 9

INPUT CLOCK FREQUENCY CHANGE

👽 Once the DDR4 SDRAM is initialized, the DDR4 SDRAM requires the clock to be “stable” during almost all states of normal operation. This means that, once the clock frequency has been set and is to be in the “stable state”, the clock period is not allowed to deviate except for what is allowed for by the clock jitter and SSC (spread spectrum clocking) specifications.
     The input clock frequency can be changed from one stable clock rate to another stable clock rate under two conditions:

  1.   Self-Refresh Mode
  2.   Precharge Power-down mode
         Outside of these two modes, it is illegal to change the clock frequency.

  For the first condition, once the DDR4 SDRAM has been successfully placed in to Self-Refresh mode and tCKSRE has been satisfied, the state of the clock becomes a don’t care. Once a don’t care, changing the clock frequency is permissible, provided the new clock frequency is stable prior to tCKSRX. When entering and exiting Self-Refresh mode for the sole purpose of changing the clock frequency, the Self-Refresh entry and exit specifications must still be met  in “Self-Refresh Operation”. However, because DDR4 DLL lock time ranges from 597nCK at 1333MT/s to 1024nCK at 3200MT/s, additional MRS commands need to be issued for the new clock frequency.

  If DLL is enabled, tDLLK must be programmed according to the value defined in AC parameter tables, and the DLL must be RESET by an explicit MRS command (MR0 bit A8=’1’b) when the input clock frequency is different before and after self refresh.The DDR4 SDRAM input clock frequency is allowed to change only within the minimum and maximum operating frequency specified for the particular speed grade. Any frequency change below the minimum operating frequency would require the use of DLL_on- mode -> DLL_off -mode transition sequence, refer to Section 4.4, DLL on/off switching procedure

The second condition is when the DDR4 SDRAM is in Precharge Power-down mode. If the RTT_NOM feature was enabled in the mode register prior to entering Precharge power down mode, the ODT signal must continuously be registered LOW during this sequence until DLL re-lock to complete.
   
      If the RTT_NOM feature was disabled in the mode register prior to entering Precharge power down mode, ODT signal is allowed to be floating and DRAM does not provide RTT_NOM termination. A minimum of tCKSRE must occur after CKE goes LOW before the clock frequency may change. The DDR4 SDRAM input clock frequency is allowed to change only within the minimum and maximum operating frequency specified for the particular speed grade. During the input clock frequency change, CKE must be held at stable LOW levels.

    Once the input clock frequency is changed, stable new clocks must be provided to the DRAM tCKSRX before Precharge Power-down may be exited; after Precharge Power-down is exited and tXP has expired, tDLLK MRS command followed by DLL reset must be issued. Depending on the new clock frequency, additional MRS commands may need to be issued to appropriately set the WR/RTP, CL, and CWL with CKE continuously  egistered high. During DLL re-lock period, CKE must remain HIGH. After the DLL lock time, the DRAM is ready to
operate with new clock frequency.

Antraxtacide_0-1767497997719.png

 

-->2026 |NOW BUILDING X299 SYSTEMS 2026--
) ROG R6EE w/10920X, ROG APEX w/10900X, PRIME DLX-II w/10980XE (
) ASCOCK X299 FaYTaYLaTeey i9 Gaming Professional Edition 7820X (
G1GABAD X299 AUROZ Master Sad w/ 7800X
--> Current X99 Systems- ROG V Ed10, Deluxe II, Sabretooth, Phoenix SLI, X99 WS 10G, Huanhzhi TF & Dell Builds

Hi,

This is JEDEC text about when DDR4 is allowed to change clock frequency mid-operation. The stability testing here is about error-free operation of the trained config, not runtime frequency switching

9800X3D / 6400 CAS 28 / ROG X870 Crosshair / TUF RTX 4090