Recent posts have discussed the point of overclocking DRAM -- what can be gained, what is the cost of trying to go too far and where is a reasonable sweetspot? In my competitive overclocking, I hadn’t considered those questions in enough detail to have an opinion, so I ran some tests with various DRAM profiles.
Since the point is to compare DRAM settings, all else is held as consistent as possible. Hardware list and CPU settings are:
Maximus X Apex motherboard
Core i5 8600K CPU (because that’s what I had in the socket)
5.2 GHz core clock, 4.9GHz cache clock at 1.37 Vcore, LLC 7, -1x AVX setback
Chilled water cooling at 14-15C (to stay above the dew point)
73C peak core temperature
GT 1030 video card
One kit 2x8GB G.Skill Trident Z DRAM F4-4400C19D-16GTXSW
The tests use six manually tuned profiles, three each at 4400 and 4133. Note that these are tuned for the best performance I can get with these DRAM sticks, this CPU, this motherboard, at this temperature, etc. YMMV.
The primary difference in these profiles is the criterion for stability. First, for 24/7 assured stability, MemtestHCI is the best choice. I only went for 100% coverage because all that tuning and testing was taking long enough without going longer. Profiles 1 and 4 are good enough that you would want those in your 24/7 gaming or productivity rig. I don’t want them on my testbench.
For competitive benchmarking, I want a DRAM profile just stable enough for the benchmark being tested. The worst case of all benchmarks on HWBOT is y-Cruncher. The Pi 1 billionth digit test exercises a lot of the DRAM and catches any errors. It also takes less than a minute on a six-thread CPU. I call the result semi-stable. In my experience, if a DRAM profile completes the Pi 1B test, it’s good enough for any other benchmark on HWBOT. Giving up 24/7 perfection gives gains in bandwidth and latency. You probably don’t want profiles 2 or 5 on your 24/7 rig, but I use them extensively – for most 3d benchmarks and for the toughest 2d tests like y-Cruncher.
The third level takes the ‘just stable enough’ approach another step. Profiles 3 and 6 are not stable, but they can do enough benchmarks that they are useful, at least to me.
Aida 64 benchmarks give a measurement of writing bandwidth, reading bandwidth and latency. Bandwidth depends on the number of read or write operations the combined DRAM and CPU can complete. Latency here is not to be confused with the CAS Latency primary setting. CAS Latency is only a part of the measured delay of a data access from the CPU’s view. Measured latency also includes the round trip travel over motherboard traces, I/O time in the DRAM and in the memory controller. Changes to core and cache clocks can affect this measurement and were held constant in these tests. Interpreting how these measured numbers affect performance is another issue.
I tested with y-Cruncher because it’s so tough on DRAM, is nearly memory bound when core clocks are over about 5.0GHz, and shows score variation with memory bandwidth and to a lesser extent with latency. Y-Cruncher uses AVX instructions, burns a lot of power and gets your CPU hot. The CPU overclock needs to be very stable to pass y-Cruncher and be useful for tuning DRAM without inserting CPU errors that confuse the issue.
Geekbench 3 emphasizes ‘everyday’ tasks in an extensive array of workload tests. It gives single-core and multicore scores for CPU performance and gives an explicit memory performance score. The multicore score responds to memory bandwidth and the memory score responds to both bandwidth and latency.
SuperPi 32m is an old, honored benchmark that uses a single thread to compute a lot of digits of Pi. This takes nearly 6 minutes per run at these clock rates. SuperPi is the next most stringent DRAM test and set the stability bar for profiles 3 and 6. SuperPi scores respond more to DRAM latency and less to bandwidth.
So, some results:
Profile 1 – 24/7 stable at DDR4-4400
I got 4400 cl17, but couldn’t get cl16 – wouldn’t POST even with elevated voltage. CL17 beats the cl 19 of the XMP profile for these DRAMs, so I didn’t try XMP at all. DRAM voltage proved to be a controlling factor. With any more than 1.45 volts, the Samsung B-die on those G.Skill sticks threw bit errors that were caught by MemtestHCI. With any less than 1.45 volts, some of the timings would need to be relaxed.
Profile 2 – y-Cruncher semi-stable at 4400
This profile was sitting around in the M10A BIOS from previous benchmarking. I revisited tuning only a little. The only change was improved stability at a lower DRAM voltage. Compared to #1, this profile shows a relatively big step improvement in bandwidth and some improvement in latency as measured by AIDA64. The benchmark scores improved proportionally. You can see why I’d rather have this profile on the testbench than #1. Note the much increased DRAM voltage compared to profile 1. That’s the trick when tuning the bandwidth/stability tradeoff toward increased bandwidth.
Profile 3 – not very stable, but it works sometimes at 4400
This was also sitting around in BIOS and didn’t take much work – again mostly with DRAM voltage. The improvements are rather small since most of the secondary and tertiary settings are nearly the same as in profile 2. Nonetheless, benchmark scores are improved. This profile has the DRAM voltage set at the lowest point where the benchmarks except y-Cruncher 1B will work. At that voltage, much to my surprise, y-Cruncher 1B runs about one time in four tries. That score gets an asterisk because it only happens sometimes.
Lesson learned: Increase DRAM voltage to get tighter settings, but lower voltage is necessary for stability.
Profile 4 – 24/7 Stable at DDR4-4133
I’ll compare across the DDR rates at the same stability level. This profile has CL16. That’s faster than 17 in profile 1, right? Not so much. 4400 CL17 and 4133 CL16 are virtually identical when expressed in nanoseconds. Aida64 latency measurements are also identical. Aida 64 measures quite a difference in bandwidth due to the higher transfer rate at 4400. Oh, wait. The y-Cruncher and Geekbench 3 scores are better at 4133 in spite of 3-5% less bandwidth. All those tighter secondary and tertiary settings in profile 4 count for something. Perhaps Aida64 doesn’t exercise them when measuring bandwidth. The difference in SuperPi time does show some effect of bandwidth.
Profile 5 – y-Cruncher semi-stable at 4133
Giving up 24/7 stability allows for some serious tightening of timings – more than at 4400. This one works at CL15, which is the same number of nanoseconds as 4400 CL16. I think it is the CPU’s memory controller that works better at 4133 in the case of CL. RCD/RP at 16 would POST, but not run with the stated stability.
We’re closer now to answering the basic questions: Profile 5 has equal or better benchmark scores than profile 2.
Profile 6 – not stable, but useful for many benchmarks.
There isn’t any DRAM voltage where profile 6 can run y-Cruncher 1b, so it can’t be called semi-stable. Read bandwidth is improved over profile 5 a lot by the two-step decrease in RCD/RP. Profile 6 claim to fame is its latency – best of the bunch – which shows up in latency-sensitive scores like SuperPi.
The conclusion: Raja is right. Comparing 4400 and 4133 DRAM speeds at equal levels of stability, 4133 is the sweetspot.
The clock rate, CAS latency, measured bandwidth, measured latency etc. are nice but don’t determine or reveal DRAM performance by themselves. On modern multithreaded benchmarks, including the ‘everyday’ workloads of Geekbench, the 24/7 stable 4133 profile beats the 4400. That isn’t really a surprise to me. I had thought the secondary and tertiary settings needed for stability at 4400 might even be worse than they are.
A major difference is the ability of the CPU-DRAM combination to reach lower CL as both counts and nanoseconds.
The surprise comes with the less stable benchmarking profiles. The 4133 profiles can be tightened so much more that they have a considerable performance advantage in many benchmarks. That advantage doesn’t hold for all benchmarks, though. I’ll have to try profiles 2 and 5, or 3 and 6, on each benchmark before I run for submission to HWBOT.
The 4133 profiles have another advantage. They are easier to tune thanks to Raja’s pre-packaged profile in the BIOS of recent ROG motherboards. That 4133, 1.40 volts profile ran MemtestHCI stable and made a great starting point for tightening. I recommend it to those who want to run a well-tuned DRAM profile, but aren’t ready to try manual tuning