cancel
Showing results for 
Search instead for 
Did you mean: 

since 0x129 microcode BIOS update: not stable with XMPII RAM profiles (PRIME95)

Gurgeh
Level 9

 

This is more for general community awareness rather than engaging in fault finding but:

Since applying 0x129 microcode BIOS update 2503 08/08/2024) for ROG Z790E Gaming WiFi (1st edition) with intel 13900K Then with all loaded optimised defaults + XMPII profile (Corsair CMH32GX5M2B6400C36 , 2x16GB sticks) , PRIME 95 blend tests fails very easily/quickly. NB: It has been about a year since I last ran these, but I was thorough back then.

All defaults (no XMP) seems stable running for about 20mins far.

Comments welcome. Speculation welcome. Fix suggestions welcome also (I've not got a clue how to approach RAM config timing issues, however, I may not be doing any testing any time soon)

I'm happy to provide more info and answers questions as and when I can though.

13,870 Views
71 REPLIES 71

It should pass. It needs to pass.

(RE isolating the memory bus) You are quite right, there are other test that are needed, that are more targeted. But such was not my goal.

It's not going to at all significantly shorten your CPUs life as long as you are under TDP (I have it limited to 12 degrees below intel Tmax, cores and package), and you do not Exceed intel's Power, current, V recommendations for various die components. (caveat: assumes intel's recommendations are competent, but at the end of the day you still have to get out of bed.)

If you use your PC 'only' for gaming, then I suppose if is good enough just to do what you do, then that is a valid engineering approach. (as long as you don't mind the odd corrupt game or OS file.) Worst case you have to reinstall something painful. No real loss.
If you use your PC for anything else it has to pass 'all' demands, no matter how 'near' inconceivable.
You need to throw the most demanding things at it. (Although, limit  how long you test for if its a real thrasher.)
You could test equivalently for longer periods of time with lower stress (but only sort of, and up to a point) but I don't have that kind of time, I need to use it.
(The sorts of test I avoid are long fumarks. Just because graphics isn't critical, in which case so what if I'm marginally unstable. It's probably just a few pixels or 1 frame that fails)
I could use Y-cruncher, but that's really just another thrasher.
But your PC needs to pass any and all tests, even if you have to apply a margin that is never used and limits ~100% of what you do.

The only thing running Prime numbers does is indicate that your CPU can run Prime numbers. I haven't used it in my suite of tests for about 10 years and never ran into difficulties. All Prime does better than anything else is behave like a power virus. If you're at default, you shouldn't need to run it at all.

I'm replying because this forum needs to serve almost as an engineering resource for others and not just a social utility.
The problem is the logic being used as the premise for the arguments, so further discussion will not serve any purpose if such is persistent. 

The only thing running Prime numbers does is indicate that your CPU can run Prime numbers.
FALSE
The x86 architectures do not have dedicated die logic for calculating Primes that is not used elsewhere. And I don't imagine for a second that Alpine believes this either. So I can't understand the argument logic behind the statement.
Even where die logic might be somewhat specialized the same underlying die fabrication process that servers the logic fullfilling P95, is used for the logic serving other specializations, furthermore some of the logic used for P95 will be entailed for any and 'all' other computaions: espcially chaches/busses/rings.
From the above it follows that by proving under added stress margins, you do provide meaningfull implication of the systems performance overall for other types of load. They are all estimations and models, and as such there will  of course be limits to the certainty of their usefulness.

All Prime does better than anything else is behave like a power virus.
FALSE.
non sequitur. Its all instruction execution, which is what we are interested in.

If you're at default, you shouldn't need to run it at all.

FALSE.
non sequitur. This statement is equivalent to "if you use defaults, its guaranteed to work, therefore you don't need to test. i.e. nothing is ever broken, and everything always functions as designed"

@Alpine_Alex 
It would be helpful if you provided your 'full' list of 10 tests.
I do mean this genuinely. My tests consist of about 4 tests.
(However, if none of them are 'as' stressful as Prime 95, serious 'consideration only but at least' needs to be given to adding P95, as it is not the case that testing under less stress but for adequately longer is necessarily equivalent or feasible. If you put dough in an oven for 30 minutes it bakes, but you can't expect to leave it out at room temperature in a sterile environment and come back to a loaf of bread in 2 years.)


----

There is an approach to testing games machines that only works for games machines, as games provide output that the human brain is very good at correcting when wrong (sounds and images), and it less face it totally non critical. But it seems to be someone too pervasive now. Testing of games machines is a bit 'loosey-goosey'.

1. @Alpine_Alex comment regarding Prime only being an indicator that the CPU can pass Prime is valid. This is because different workloads produce different current patterns and swings in current. If you can pass Prime, this is only an indicator that the system is able to pass Prime - to suggest otherwise would mean you would have to share other scenarios that use Prime numbers. It has little to do with the architecture (outside of overclocking and exposing certain instability) and more to do with the data pattern and its impact on the CPU in question. More so when the system is overclocked as to which subsystems are likely to encounter instability when using it. You're implying users should limit their OC potential on the basis that they may not be able to pass Prime, which isn't something I'd agree with.

2. If you don't intend to use the system to run Prime, there's no inherent reason to use it - it is a synthetic workload. There are other tools out there which include AVX routines that are more indicative of real-world workloads. Given the fact Intel has resorted to capping the VID in an effort to preserve longevity, you can understand why some users would question why someone would expose their CPU to high levels of current they otherwise wouldn't see. https://community.hwbot.org/topic/141945-hwbot-x265-benchmark/ is one reasonable benchmark which incorporates encode and AVX routines that one might encounter.

Given you're implying you're trying to enrich or enlighten people, I would recommend you focus on what you consider your stability regiment to be rather than trying to imply one is strictly better than the other.

 

9800X3D / 6400 CAS32 / ROG X870 Crosshair / TUF RTX 4090

Supermod, it would have been nice if you had contributed when people were being advised to indiscriminately 'slap more voltage' on our problems.

If intel or amd released a chip, and it was claimed "just occasionally these chips will say 1&1 = 0 it would make the global news. 
This has happened for implausible edge cases and they have made the global news and had to issue workarounds for that.

You're implying users should limit their OC potential on the basis that they may not be able to pass Prime, which isn't something I'd agree with.
More than that, unless they are OCing to set criterion for competition, or unless they do nothing more critical than gaming (those are important caveats) then absolutely, yes! If there is ANY test a build fails where it produces 1&1=0 instead of 1(e.g.) then absolutely they need to limit their OC, because its a failed OC by some standards.
I want my CPU to perform flawlessly. That is the intention for home and business markets.
If intel advertised that  "we have fuse tuned these chips so perfectly, if you push them at all they WILL CERTAINLY be unstable for some tasks" then I would never buy a K processor again personally. (truth may be  that is where we are already.)
I cannot do the testing Intel does, so we approximate with margin. (I undervolt till unstable, then I reduce to thoroughly stable, then I + some V margin back in yet again because my tests are not perfect. I do not actually OC, I do not add voltage over default)
It may be there are better tests than P95 I have no doubt. But if those tests pass, and P95 fails, then for many peoples purposes then P95 is proven to be the best test to run.
I want my PC to perform better than an Walmart HP for grandma, not worse.
The discussion I am having now is not about P95 it is about any test that fails. That becomes the best test to run, until the system is stable again. Stress testing/torture testing there is no difference saving torture tests have more rapidly diminishing returns and greater escalating risks the longer you run them.
I'm certain there is a better test to run, but it would at least have to fail if P95 fails, to qualify as a replacement for P95. IT may still qualify as an addition regardless.

If you don't intend to use the system to run Prime, there's no inherent reason to use it - it is a synthetic workload.

If it wasn't that you are a SUPERMOD I'd think you were trolling me. Yes its a 'synthetic' workload, its a 'test', its a part of a model, to approximate something I can't know for certain, which is every bit permutation change and its ambient condition that am ever going to run over the system's lifetime. Yes its a synthetic workload. It provides 'margin'.

I accept that if I had to pick one test, that if the test passed, then P95 is probably not the best one test to pick. But it is wrong to say there is no point in running it, especially if it reveals a failure, and we don't have to pick one test.

 

 


@Gurgeh wrote:

Supermod, it would have been nice if you had contributed when people were being advised to indiscriminately 'slap more voltage' on our problems.


Not sure what this comment means, doesn't quite make sense. Are you implying a moderator should have known what the underlying cause was for vMin shift problem? Interesting take.


@Gurgeh wrote:


You're implying users should limit their OC potential on the basis that they may not be able to pass Prime, which isn't something I'd agree with.
More than that, unless they are OCing to set criterion for competition, or unless they do nothing more critical than gaming (those are important caveats) then absolutely, yes! If there is ANY test a build fails where it produces 1&1=0 instead of 1(e.g.) then absolutely they need to limit their OC, because its a failed OC by some standards.
I want my CPU to perform flawlessly. That is the intention for home and business markets.
If intel advertised that  "we have fuse tuned these chips so perfectly, if you push them at all they WILL CERTAINLY be unstable for some tasks" then I would never buy a K processor again personally. (truth may be  that is where we are already.)
I cannot do the testing Intel does, so we approximate with margin. (I undervolt till unstable, then I reduce to thoroughly stable, then I + some V margin back in yet again because my tests are not perfect. I do not actually OC, I do not add voltage over default)
It may be there are better tests than P95 I have no doubt. But if those tests pass, and P95 fails, then for many peoples purposes then P95 is proven to be the best test to run.
I want my PC to perform better than an Walmart HP for grandma, not worse.
The discussion I am having now is not about P95 it is about any test that fails. That becomes the best test to run, until the system is stable again. Stress testing/torture testing there is no difference saving torture tests have more rapidly diminishing returns and greater escalating risks the longer you run them.
I'm certain there is a better test to run, but it would at least have to fail if P95 fails, to qualify as a replacement for P95. IT may still qualify as an addition regardless.


Not sure what performance has to do with passing Prime, perhaps you've lost sight of what you were talking about. I thought this was about stability.

Whilst it's true it should pass P95 at System Defaults, are you suggesting people should run P95 even if their system is stable? And if it's already unstable performing certain tasks, explain why the immediate response is that it needs to be able to run Prime numbers. Perhaps @Alpine_Alex was confused as you seemed to be complaining about XMP overclocking stability and then spoke about running Prime, which makes no sense. Perhaps you should explain this thought process so we can understand why you'd choose this. 

Given everything you've said above so far, we can deduce the only objective preference would be to establish if the CPU has already suffered a shift in minimum voltage or if it's faulty.

Many times people have clung to the same ideology regarding Prime and it goes against their ethos that they need to run everything or else the system isn't stable. On X99, people would overclock/overvolt the ring bus and run small FFT and wonder why their CPU had started degrading so quickly. Was the answer to keep running it until they could find stability? Or simply run something that reflected their daily workload more accurately. Nothing uses AVX routines quite like Prime does, so why does one need to be able to run it outside of some nonsensical "need" to be able to pass it. Remember, no system is 100% stable - a stock system is perfectly capable of flipping a bit. This is why ECC memory exists.

Here are some questions you may want to ask yourself.

  • Why would I want to test for permutations or workloads the CPU won't ever see outside of fault finding?
  • If I'm unstable in my daily workloads, is passing Prime necessarily going to help me pass anything else? After all, you were talking about running it to fault-find memory-related instability.
  • Given Intel has capped the VID to prevent the chance of accelerated electromigration, why would I want to risk running small FFTs in Prime, exposing the CPU to high levels of current just to ensure it can pass this one test?

@Gurgeh wrote:


If you don't intend to use the system to run Prime, there's no inherent reason to use it - it is a synthetic workload.

If it wasn't that you are a SUPERMOD I'd think you were trolling me. Yes its a 'synthetic' workload, its a 'test', its a part of a model, to approximate something I can't know for certain, which is every bit permutation change and its ambient condition that am ever going to run over the system's lifetime. Yes its a synthetic workload. It provides 'margin'.

I accept that if I had to pick one test, that if the test passed, then P95 is probably not the best one test to pick. But it is wrong to say there is no point in running it, especially if it reveals a failure, and we don't have to pick one test.


Prime95 is focused on mathematical calculations, specifically using FFTs, and its load characteristics are designed to generate extreme heat and stress CPU cores and caches. However, not all applications stress the CPU or other components in the same way and nothing stresses the CPU like Prime does. It was for this reason Intel introduced the AVX offset function. I fully encourage people not to run it unless they have an objective reason to, especially in light of recent vulnerabilities exposed on recent CPUs.

I've never run P95 on any 12th, 13th or 14th gen CPUs and I don't intend to.

Applications like AIDA64 or OCCT combine a variety of stress tests that not only stress the CPU but also incorporate memory and GPU, offering a more well-rounded system stability test. These tools allow you to choose between synthetic and more application-like workloads, giving you a more representative test of your overall system.

Nobody is trolling you simply because they don't agree with your ethos, despite whether you may think it's "loosey-goosey" 🙂

Regarding your memory stability problems, there's no need to even look at Prime at all. This is not a DRAM stress test. Use Karhu Ramtest or TM5. Test Karhu for approximately 3 to 6 hours minimum.

XMP is classified as overclocking as we're running the system out of spec. As such, stability is susceptible to changes in CPU ucode or auto rules from system to system. Quite often manual tuning may be required which is why the CPU voltage rails are exposed to the user for tuning. Key voltages are listed in this guide which may help you.

9800X3D / 6400 CAS32 / ROG X870 Crosshair / TUF RTX 4090

Given you're implying you're trying to enrich or enlighten people, I would recommend you focus on what you consider your stability regiment to be rather than trying to imply one is strictly better than the other.

As you are SUPERMOD again I'm going to assume you are not trolling me.

Assuming you are moderating and not just contributing, I appeal to you to reread my posts. Nowhere have I tried to imply P95 is better than Apline's test, I have only had to defend mine, when Alex did just that and proceeded with an engineering forum a line of reasoning that I am (not at all humbly) certain is flawed and will lead to problems for others.
I'm not having a good ASUS experience yet again. As I say I think Enthusiast PC is no longer makes the sense it used to.

Gurgeh
Level 9

I'm the Original Poster, and I have an update.

Now that 2503 is no longer listed as a BETA firmware I thought I'd take a fresh look at this.

I did not re-download the firmware just because its not listed as BETA. I take it that if there had been a change they would have given it a new version.

I reapplied the optimised defaults (which I thought I had done) and reapplied just XMPII for my memory, and tested again, and this time it has been thoroughly stable. P95 [torture, with threading, all cores, all AVX left enabled] 30 minutes Large FFTs, 15 small, 15 minutes medium, 30 minutes blend. Cinebench R23 30 minutes. All stable, no errors, no windows hardware error flags (such as monitored by HW monitor).

Now, there is a possibility that this difference is that I'm not any more stipulating an AVX guardband scale factor of 100, but I doubt I reapplied that when I tried to go back to defaults + XMPII. The other change I made was to change VRM Loadline to from 4 to Auto (not the default) and back to Level 4 (which is now the Asus default).

The other possibility is that this was some exotic windows bug (would have to be pretty exotic, assuming you don't count the whole downgrade of user experience that is W11 as one gigantic bug ‌‌ ), the other thing is that I gather Microcode updates can be delivered by windows sometimes? Has something like this happened?

 

I have even been able to reapply my Global adaptive undervolt of -0.052 (tuned to my chip) during the above tests.

 

So basically my QVL memory is once again operating stable with its supplied XMPII profile with defaults.

 

To anyone out there still having problems I feel for you. This whole microcode change has thrown tings in the air.

In an ideal world (while I appreciate that Intel has created a huge burden/cost on the mobo makers), I would like more detail from ASUS, on whether they have rerun their QVL tests in light of the 0x129 change. But I sympathize if they have not (unless intel is going to foot their bill), and I sympathize that they have not jumped at the chance to highlight this as people will blame them, when they should be blaming intel only.

 

(SALTY OPINION: To be honest I'm thoroughly disappointed with my experience of enthusiast PC building in many respects not just this intel business. The cost vs features of the motherboards, the poor experience overall I regret I've had with this ASUS  Strix Z790E board given the price I  paid, the cost of GPUs now.

Concerning Mobos, I just want a Z790 board, premium price quality/longetvity of components, with a diagnostic display. I don't need 3/4 of the 'features' this board has.I would would have 'preferred' that it did not have such 'convolutions'. That said I was content until I realised that the ASUS crate software is horrendous junk, and I now can't even use half of that 1/4 of features. It does not work. And causes no end of headaches beyond the software own features. Then there is the whole business of the 'custom'  ergo 'in'secure boot 'un'protection configuration and the dismal rootkit approach to forcing ASUS crate from the UEFI bios. That's a horrific practice. Especially when you look at the state of ASUS crate. It seems to be built on JSNode:- for software with elevated privileges, rootkitting itself from firmware, with access to kinds of BIOS settings it has... the mind boggles. JSNode has its place, but it is not for something like this.

As far as I am concerned enthusiast PC building is now just a badge of shame. Its a 'mug's game'. Unless you've got no-choice/no-life for some reason like me, buy a console [best value GPU on the market] and get everyday driver laptop with an i5 or something there about.$500 for a console. How much does a PC cost? By not buying one what's the earliest you can now update you console and be ahead on savings? Answer, as soon as the next one comes out. You could replace a PS5 with a Pro and not give it a thought.

Enthusiast PCs are now an foolish game. My old Gen 2 i7 with the ASUS P9Z68 board back then was absolutely knockout. I was proud of that machine for 11 years, the mobo in particular. What the goodness gracious happened.)

my bios ver#beta, and same bios ver#non-beta are equally broken. id assume your assumption rightly assumed😎

WoolCladWolf
Level 7

found a used encore on amazon hoping the boards MSRP value yields decent, support,  

so far been on ddr4 and disabling 3 least 

P : P cores nets better/ Higher frequency most post-able less boot-able(windows)  0-Validation GL

yeah in general microcode is going to lower volts, which may cause (in)stability. Under Volt protection settings may or may not help and may or may not be compatible with all software i.e. XTU *?

hearsay via BZoid rant the socket has outgrown the current VRM type of traditional current consumer MB overall architecture not, (mostly regardless) of quality or numeri of phase count etc. employed by the architype...