cancel
Showing results for 
Search instead for 
Did you mean: 

Are there Linux folks out there? Seek advise on tools for benchmarking and mon-1

metadist
Level 9
Greetings all,

I run Linux on my recently built PC (8700K+Asus Prime z370-a, Alphacool Eisbaer LT240 AiO cooler, Super flower Ledex II 750W), and I find it a bit hard to get the info about tools and tricks regarding OC. Here is something I have found and used so far, and then I seek some advise.

1. Benchmarking tools

a. prime95.
I used three version of it:
- the most recent one,
- 26.6 (that is supposed to be avx-free, but I see does have some opscodes from avx set,
- and 25.11, that is really AVX free

I wanted to to use the last one as a stress test on the max freq, but I observe that it uses 5 out of 5 cores initially, and then regresses to just 4 cores.

b. Intel Linkpack benchmark
This puts the heaviest load on the CPU, much heavier than prime95. But since this is matrix multiplication load type, I can't dismiss its results for my OC results, since I do want to perform such kind of calculation on my PC ( I am data scientist)

c. I have written some of my own tests on Python using Numpy and sckit-learn packages, as well as LightGBM - since this is a kind of load I'll be putting my PC under on a daily basis. I observe that, in particular, LightGBM can induce long avx-free load on all cores and run on the max turbo-frequency.

2. Monitoring tools

- i7z can print some stats on cores, like freq and temperatures, and some more,
- turbostat can do the same
- pcm (pcm.x) I ended up using this one, got it from Github, since I found it somewhat more convinient to run with benchmarks, even though it has some pecularities, i.e. it reports freq as a multiplier to the base 3.7 one (i.e. will report 1.34 for 5GH), and report temperature as 100-t

3. Tricks

To run monitoring, I have to do this on my unbuntu 18.04

echo 0 > /proc/sys/kernel/nmi_watchdog
modprobe msr



4. My problems

I wanted to get rid of Linux tools to manage power, thus completely relying on bios settings (to begin with)
I have turned off thermald, and got rid of acpi_pm kernel module with grub option thermal.off=1
I don't think. how ever this helps, as I still see that intel_pstate driver governs power. My understanding that If I disable it, that some thing else will take its place, i.e. cpufreq. Then, I don't see howto effectively manage intel_pstate, i.e. how it works in real time and how to set its preferences.

So far I see that the system starts to throttle very aggressively on load, i.e. under a OC profile (settings from @de8bauer or @The Sentinel videos) would go under 3Ghz with time, even w/o reaching 80 degrees. I don't really understand if this is due to BIOS scaling or Linux thermal manaegement)

I also can't figure out to to display vcore

Finally, how to run mprime in benchmark mode, i.e get some numbers for prformance evaluation? So far I was only running ./mprime -t for torture tests

Thanks in advance, all Linux enthusiasts!
15,229 Views
2 REPLIES 2

metadist
Level 9
Answering to my own thread, I've now got some more info on the tools I used under Linux. Enjoy!

Benchmarking Tools
The tools I used not only apply high load on CPU and RAM, but also have some measurable output, and, more important, have checks in place to assure no errors in calculations. If you systems does not freeze or crash, but produces errors during a test, it can not be considered stable.

Prime95

The is probably the only common benchmarking tool between Windows and Linux users. However, Linux version still has some limitations. It does have a command line "torture mode" switch, but there is no switches to run a particular test (i.e. change the size of FFT vectors). Instead, prime95 throws different tests which results in uneven load on the CPU, thus not really applying a highest sustained load. When launching it like*./mprime -t with no additional configuration, one has to wait some time for it to start running small FFT size tests. Another limitation is that it does not output a single figure you can use as a measure of your setup performance. When I halt continuous torture mode, it just says how many tests it completed in what time. I does report a number of errors and warnings though - pay attention to that. I'd be interesting to see how Linux folks use this tool for benchmarking, not just load torture.
There are several versions that are worth mentioning. The current version 28.x uses AVX and AVX2 instructions and it can apply the maximum load to the CPU.*
There is a 26.6 version that is not using AVX, and sometimes overclockers "cheat" by using this version. This is ok, when you want to test your system at high, but more typical load, since AVX/AVX2 sets are really less used in common software and games, then in specialized one, like rendering or numeric calculations. I however found out that 26.6 for Linux does use instructions from AVX set, and this can be detected in two ways: one can use a simple utility elfx86exts to find what instruction sets a given ELF binary or library uses (read more on this utility below), or just observe core frequencies drop by AVX offset, as specified in the BIOS. This leaves me wondering if Windows version is really built without the AVX instructions or it translates them to somethings else.
Finally, there is 25.11 version that is truly AVX-free, but its rather old and doesn't run well on a 6-core CPU, i.e. doesn't load all cores.
Thus I settled on the current prime95.

LINPACK

Intel's LINPACK benchmark is imposing a lot more load on the CPU and RAM, and it's noticeable by increased temperatures and power draw. LINPACK test solves large systems on linear equations and thus benefits from vectorized AVX/AVX2/AVX512 instruction sets. Besides, one can chose test parameters that can impose more CPU load or less, use more RAM or less. The latter is especially to advantage, when one wants to test the RAM and has greater than typical amount of RAM, i.e. 64GB.
The minimal configuration consists of 5 numbers, each in one line. The config file does require header lines for whatever reason. Take an example from the distro.
I have used the following configurations:
Heavier CPU, less RAM:

1 40000 40000 1 1


This test would run a minute and uses 12 GB of RAM.
Lighter CPU, more RAM:

1 90000 90000 1 1


This test runs 3 minutes and uses 64GB.

I must note, that "lighter" test was still some 10-30% heavier on the CPU than prime95 (depending on which FFT size mprime uses).

Monitoring Tools

Under Windows they use HWInfo64 and CPU-Z. There are no such tools under Linux. Live with it.*
Most of people turn to lm-sensors package (and its sensors binary, as well as some graphical frontends, like psensors) for various readings, and this does make sense for some models, and some parameters. But beware that some HW is not supported by lm-sensors, or modules are maintained by a single enthusiastic persons, i.e. look for i87.
Also, it turned out my Asus MB only supported core temperatures via a generic coretemp module, and that's it.
Apparently, MSR is less universal way of getting the readings and is processor-specific, that's why we turn to Intel-only tools at this moment. Note that use of MSR needs some prerequisites. On Ubuntu systems:
modprobe msr
echo 0 > /proc/sys/kernel/nmi_watchdog

Also note, that the latter might not be possible to execute under sudo, it's possible that you'll need to become root with sudo su.
To make sys control change permanent, you can add kernel.nmi_watchdog = 0 to your /etc/sysctl.conf\

i7z

This is an open source utility developed by a student a while ago, however still pretty useful for basic overclocking needs. It reports the following instantaneous values per core: actual frequency and a BLCK multiplier, % of time the core is in C0 (running) or C1 (halt) state, as well as couple of deeper states, core temperature, and (drums!) core voltage. I have not seen a utility that could give me core voltage, while it being one of the key parameters to watch.
The utility is packaged for most distros and needs to be run under superuser. It is dated and lacks implicit support of the newest generations of CPUs, but hey, it works nonetheless!

pcm.x from PCM

PCM is an Intel specific package, and gives a lot more runtime details, and allows to dump them to a file in csv format, which convenient for further analysis. For overclocker needs it can give familiar frequency (although as factor of the base frequency, i.e. 1.30 means 3.7GHz*1.30=4.8GHz), core temperature (although as degrees left in the thermal headroom, i.e. 100 minus actual temp in my case), but also gives CPU energy and RAM throughput. It does not show voltages. The CPU power part is a bit controversial. How is it calculated? Can it be trusted, even as relative measure? I'd prefer to double check with inline powermeter at the 8-pin CPU power molex socket on the MB, but I lack such a tool. Instead I relied on a wall socket powermeter that used to give me a total power drawn by the PC.
pcm and i7z will happily work together at the same time (in different terminals, obviously).

turbostat

Written by an Intel employee. Can collect more information than the two above utilities and is extensible - one can request in a command line to add specific MSR or sysfs entries. Unfortunately, does not show vcore out of the box.
dstat
Modern all-in-one alternative to iostat, vmstat etc. Has modules for many different monitoring needs, including thermal. Install via your OS package manager.

CoreFreq

Very interesting terminal-graphics monitor (with beautiful soft colors!). Download from Github repo and build. Follow README to load the kernel module, start the deamon and run the cli:
$*./corefreq-cli -h
CoreFreq. Copyright © 2015–2018 CYRIL INGENIERIE
usage: corefreq-cli [-option ]
*-t Show Top (default)
*-d Show Dashboard
*-V Monitor Power and Voltage
*-g Monitor Package
*-c Monitor Counters
*-i Monitor Instructions
*-s Print System Information
*-M Print Memory Controller
*-R Print System Registers
*-m Print Topology
*-u Print CPUID
*-k Print Kernel
*-h Print out this message

Without any options it runs a terminal-gui in a "top" mode. Then one can use the same option letters do work as mode-switches. It unfortunately won't show me vcore values (but all zeros).

Intel's powergadget

Check out from Intel developer zone. Not maintained for Linux any longer.*

OS Kernel Tools (Do you need*them?)
First, see excellent kernel.org write up on CPU Performance Scaling.*

acpi-cpufreq kernel driver

The is the driver that at forefront of OS power management, as it's CPU-brand agnostic. Nothing to see here for Intel owners, as intel_pstate driver takes its place.

intel_pstate kernel driver

The is the driver that at forefront of OS power management on Intel cpus, unless it's disabled by a kernel boot parameter intel_pstate=off
See again Kernel.org's documentation on intel_pstate CPU Performance Scaling Driver.
What seems to me is that Intel believes CPU and BIOS are already doing good job with limiting CPU power consumption while maximizing performance, so it does not need a lot of hints from the OS.
This driver provides two policy governors - powersave and performance, where powersave is more like cpufreq's ondemand governor. I have tried both (change via cpupower utility, described below), and saw no difference in power consumption or performance during the test. I made the following interesting observation - while powersave policy is on, CPU downclocks to 800MGz at idle, and idle power consumption is ~40 Watts (at wall socket, so objective measure). When I change the policy governor to performance, it starts to run CPU at the max frequency, but the power draw stays the same! Just remember, that it's neither frequency, nor vcore, that affects power consumption directly- it's the actual computing load on your CPU.
energy_performance_available_preferences can be further turned off with kernel boot parameter thermal.off=1. Or the preference can be changed from default balance_performance to performance by writing to energy_performance_preference.
Thermal daemon
It is already on by default in Ubuntu 18.04. Honestly, given extensive BIOS tweaking tools and intel_pstate policies, which is thermald's default choice, I'd rather not use it, so I just disabled it.
May be it could be useful for laptops, so one can further restrict power consumption, but for any overclocking needs, it can be safely disabled.

Other utilities

elfx86exts

Disassembles binaries and prints which instruction sets it's using. I use it to find if a binary uses AVX family sets. Clone this Rust source from the Github repo and build (some extra packages required for build). Use as:
$ elfx86exts/target/debug/elfx86exts - help
elfx86exts 0.1.0
Analyze a x86 binary to understand which instruction set extensions it uses.
USAGE:
*elfx86exts
FLAGS:
*-h, - help Prints help information
*-V, - version Prints version information
ARGS:
* The path of the file to analyze


cpupower

This utility from linux-tools-common can show and set some power management parameters on the OS level. I.g. to set the governor:
sudo cpupower frequency-set -g performance

For the most initiated

Read performance counters directly from MSR. In this example, we read TjMax - maximum allowed package temperature. This is often a reference point in temperature reporting. This should output 100 on Intel 8th Gen. Your mileage may vary.

sudo rdmsr --decimal --bitfield 23:16 0x1A2


Use "Intel® 64 and IA-32 Architectures Software Developer's Manual. Volume 4:Model-Specific Registers" for reference.
For example, one can disable IA32_MISC_ENABLE and not have voltage/freq drop due to htermal event. Or, perhaps, set such drop using MSR_THERM2_CTL.
MSR_PERF_STATUS[47:32] can be used to read Core Voltage. P-state core voltage can be computed by MSR_PERF_STATUS[37:32] * (float) 1/(2¹³).
(there is en error in the doc, it's either 47 or 37)

Links

lm-sensors github
PCM github
CPU frequency scaling by ArchWiki.
Thermald github
Intel Thermal Management (8th Gen platform datasheet vol1, see chapter 5)
Overview of CPU performance counters (in Ukrainian)

Arne_Saknussemm
Level 40
I guess it's just you on Linux then, metadist 😮

Just kidding...nice one for writing that up...might help someone who accidentally tries it one day 😄