cancel
Showing results for 
Search instead for 
Did you mean: 

Maximus Hero VII - suddenly losing power every day for 11 months.

cheruYT
Level 7
Hi guys,

As of September last year, I've experienced over 500 Kernel 41 errors due to my PC losing power suddenly. It happens when idle and while under load, no more than 2 times in any 24 hour period (all at different times, though).

I've troubleshot basically everything I can think of which now leaves me at the motherboard (Maximus Hero VII).
I've already purchased a new Gigabyte mobo and cpu, as well as new DDR4 ram, but don't want to install them until I've exhausted every possible option first. You can imagine how many things I've tried over the past 11 months or so that I've been dealing with this issue. It's not the PSU, it's not the ram, it's not the temps, it's not HDD/SSD related, it's not a driver issue. It's either the cpu, the mobo, or a problem between the two of them.

Two days ago I tried increasing the voltage by 0.15 in AI Suite to see if "perhaps" it was due to an insufficient amount of power being sent to the cpu, and interestingly enough, the issue didn't occur for almost 40 hours, but it happened again yesterday. Also, between December of last year and April this year, the issue seemed to have completely gone away. Nothing was changed during this time regarding hardware or OS. I've currently forced the voltage to a manual setting of 1.3 in the BIOS because anything below that was where the shutdowns were occurring. Whether this reduces the frequency, your guess is as good as mine, but I'm trying whatever I can in a last effort attempt to fix it before I toss it in the trash.

There is absolutely no information in event viewer under the crash log. Just kernel 41, power lost, the end.
I also invested in a $400 UPS to see if it is an outlet issue - sadly, it isn't.

Cables are all seated correctly, components are plugged in correctly, everything is as it should be.
Have tried every single BIOS rev. to check for stability, and turned off anti surge protection in BIOS as well - no dice.

Any help would be appreciated.

Perhaps irrelevant - I bought a 1080 Ti 2 weeks ago and upon installing it into my motherboard, I discovered that the bottom PCI-E lane that I was using for my 980 Ti would not work with the 1080 Ti. Whenever the 1080 Ti was plugged in, the PC would simply not turn on. If I put the 980 Ti back in, it would work. If I put the 1080 Ti in the top slot, it would also work. I thought this was very strange, though the issues I'm experiencing predate the new GPU by over 10 months. Just another unusual occurrence that points to mobo issues.

Thank you.

CPU - 4790k
Mobo - Maximus Hero VII
GPU - 1080 Ti
PSU - CoolerMaster Vanguard 1200w
Ram - 2 x 8gb G.Skill DDR3
Cooling - H100i
2,203 Views
3 REPLIES 3

Korth
Level 14
Welcome to ROG, may the Force be with you, yada yada all the usual preamble :cool:

The software errors/logs are basically just reporting that the software isn't happy with sudden power failures, lol.

It's reasonably safe to assume your new UPS works perfectly, and it apparently hasn't fixed your power failure issue or had any effect at all. So I agree it's safe to assume your AC power source (wall-receptacle) is reliable enough. Just in case it's relevant, which UPS model?

Did you change any hardware, firmware, or software around the time the problem started appearing (Sep/2016)?

Are you overclocking anything, have you changed any other voltages? Have you tried Clear CMOS to default BIOS settings?

Have you tried a clean install of new OS and drivers? Have you scanned your system for malware/etc which might be forcing the system to power off?

You say cables and components all have good electrical connection. I assume you've specifically confirmed the mobo power inputs (24-pin EATX and 8-pin EATX12V) are securely plugged in, and the PSU wires/pins on these connectors are intact (not frayed or loose, etc), and that both DDR3 DIMMs are firmly seated.

Visually inspect the M7H mobo, both sides if you can. Especially the VRM areas, electrolytic caps, and traces around the power connectors. Has it ever been flexed (and perhaps cracked) with too much force pushed down onto the EATX/EATX12 connectors?

Which specific brand/model 980Ti and 1080Ti cards?

Try removing one of your DIMMs. If the sudden power failures persist then swap DIMMs. If the problem only occurs with one particular DIMM then you've got bad memory.

Try replacing ye olde CR2032 battery on the mobo. Ideally, you can check that the replacement battery has good voltage (3.0V+) before installing, these things have a self-discharge shelf-life and there's no telling how long they've been sitting on store shelves.

Monitor your voltages. The PSU is probably de-rated from age, regulation might be falling outside of allowable tolerances, and this is especially disastrous if it's straining mobo VRMs which already have to run part overclocks and PWMs, lol. Try using different 6-/8-pin PCI12V power outputs for your GPU card, if your PSU has them.

In fact, try a different PSU if you can. Preferably one which doesn't reconfigure your system (by installing Corsair Link software, etc). If you can swap PSUs with another computer then you'll be able to determine whether it's your computer's PSU or it's your computer's mobo if/when one of the two computers fails (and if they both never fail then problem fixed, lol yay).

It might be time to repaste your CPU. The old TIM may have cooked off by now. How are your temps when your "idle and under no load" system crashes, how are your fan rpms and pump rpms (are they starting to fail)? (You may have to monitor temps and rpms realtime to witness if they suddenly spike just before everything crashes, they may never get a change to get written in logs before crash, lol.)

This mid-2014 mobo could already be three years old. Probably has (had) 3-year warranty on the mobo, on the CPU, maybe also the PSU. It's high-quality stuff, probably still good and capable of running for years to come ... but no guarantees, lol. I'm tentatively pointing the accusatory finger of blame squarely at the PSU, but maybe that's just because I don't like Cooler Master, lol, and other things (starting with malware, software, firmware, electrical, temps, rpms, voltages, bad RAM, bad TIM, and mobo battery) should all be ruled out before replacing it.
"All opinions are not equal. Some are a very great deal more robust, sophisticated and well supported in logic and argument than others." - Douglas Adams

[/Korth]

Thank you for the extra elaborate response!

The PSU is only 15 months old and the fact that the issue disappeared entirely for approximately 4 months, makes me believe the issue to be something else - though I can't be certain as I have no other PSU to test it with and don't know of anybody who has. I would need to buy another in order to test this out. The UPS is a CyberPower.

I have checked that all cables are seated properly and secure, and that the issue isn't due to a pin being out of place or bent. Ram is also securely in place and the issue persists with one or both sticks. I've also tried the other PCI-E plugs from the PSU into the GPU, as it has support for two. Same issue with both. Many of the more technical things you mentioned are beyond my scope of understanding and capability, unfortunately.

I haven't reinstalled my OS as my understanding is that the OS may cause BSODs or other issues, but it wouldn't force the PC to suddenly shut down in the blink of an eye. There would be some reportable error.

I reset the CMOS when I installed the new GPU a few weeks ago. Issue persists. 980 Ti was a Gigabyte G1 and 1080 Ti is a Gigabyte Aorus Xtreme.

The motherboard should still be covered by the 3 year warranty, but I simply don't have time to stuff around while ASUS processes an RMA and leaves me without a computer. I make my income off of YouTube and can't afford to be without my PC. The spontaneous shutdowns during recording though, is becoming a massive problem. I'm constantly losing footage and it's become an immense pain in the ass.

Should I get a new modular PSU and pull out the CM and throw another in there??

Korth
Level 14
There are compatibility issues with Maximus VIII mobos running Gigabyte Aorus 1080Ti. Many reports from people who couldn't get it working, few reports from people who could. Maybe this compatibility issue also exists with Maximus VII mobos. A GPU hardware incompatibility can cause sudden shutdown, but your 980Ti shouldn't have had this issue. And the iGPU built into your i7-4790K shouldn't have this issue - see if the system runs stable on iGPU, but disable iGPU in BIOS (and uninstall iGPU drivers from Windows) when they're not used and not needed.

A WinOS actually can cause sudden shutdowns, but very rarely since the BSoD error reporting does catch all but the most unreportable hardware faults.

Are your motherboard and PSU both electrically grounded to chassis? Metalized screwholes on motherboard should be mounted on metal chassis with metallic screws and standoffs. Metal case on PSU should be screwed onto metal chassis with metal screws.
Nonmetal (plastic, glass, etc) screws/standoffs or chassis materials aren't good electrical ground. Rubbery anti-vibration material between PSU and chassis prevents proper electrical ground. Noise-damping material lining interior chassis panels can block electrical ground. If you have any of these things then you'll need to attach conductive wires and lug nuts to screws on PSU and/or motherboard. All of the ATX (and other) power connectors provide their own ground wires (and they work as designed), but I think it's just unwise to rely on these sorts of things when a dependable ground solution can be wired in place for a few pennies, especially when weighed against the downtime and annoyance/frustration/stress this easily correctable design oversight can cause on a system which needs to be kept running. (My philosophy is that once you're in for an overclock then you're in for a maximally-engineered overclock, no sense in doing it at all if it's done wrong.)

Bad chassis wiring/pins (on the Power and Reset buttons, anyhow) might cause the system to shut down, but I doubt in the sudden way you describe. I've seen a few Corsair chassis with yucky cheap Power/Reset wiring, lol, annoying when you get a system reset every time you thump your desk.
Noncompliant devices attached to the computer - audio, printer, camera, network, USB, RS232, whatever - can also cause electrical faults. Less and less common these days but it can still happen, especially when plugged into aging (and let's face it, less stable, not always aging gracefully) motherboard interfaces. Unplug anything and everything you don't actually need plugged in.

Does your PSU run really hot? Has its internal fan seized? If so, it's possible (and not even hard) to repair - although I never advise opening a PSU without appropriate technical knowledge. A new PSU isn't a bad choice, and it doesn't have to be modular, without pulling out detailed PSU calculators I think a 600W would suffice for your system (and 750W would give more reliable oomphf), and I'd focus on better build quality (ATX12V 2.31 and EPS12V 2.92 or better, has active PFC, 80Plus Silver or Gold or better) rather than on raw Wattage, your 1200W Platinum is massive overkill but then again you have a problem you can't troubleshoot and haven't eliminated your PSU as the cause, lol.
This EVGA SuperNOVA G2 750W Gold Modular PSU is somewhat overkill, has 10 year warranty, and runs startup POST diagnostic (a thing which saves much time when troubleshooting because you know when the PSU does or does not work properly, lol).

But again, I wouldn't buy a new PSU until/unless it's been identified as the problem - the best PSU in the world won't help if the motherboard or CPU or RAM is dying.

I'm assuming you've installed latest-greatest Maximus VII HERO firmware (BIOS 3201) and Z97 Chipset/IME drivers (available directly from Intel if you don't want the ones at the ASUS M7H site)?

You should disable all overclocks on everything. Run the i7-4790K at unexciting stock 4.0GHz Base and 4.4GHz Turbo clocks. Run the memory at unexciting JEDEC DDR3-1600 SPD. Run your GPU (and its VRAM) at reference frequencies. If the system continues to suffer from sudden intermittent power fails then you know it's a (motherboard) hardware issue. If the system runs stable then you know it was a misconfigured or unstable overclock (somewhere in CPU, RAM, and GPU).

You can try cleaning the RAM sticks/slots.
Korth wrote:
Electrical contacts on the DIMMs can be scrubbed with a Q-tip and isopropyl. "Electronics grade" isopropyl, anhydrous (>99.9%), not low-percentage drugstore stuff filled with perfumes and additives that'll leave (conductive) residues. Terpenes will work as well, d-Limonene is an aggressive solvent which smells pretty (like oranges, mmmm!) and is safe on many *but not all* PCB plastics. A polymer-based electrical contact cleaner like DeoxIT or Stabilant-22 is even better and utterly guaranteed to be safe on all plastics. The trusty old "pink eraser" trick will scrub away grime very well, it can produce ESD harmful to sensitive devices (which have no real effect on unpowered RAM, though), but beware that it also erases thin layers of (relatively soft) gold plating off the contacts so it should be used only sparingly.

The DIMM sockets can be blasted with compressed air. Or scrubbed with a soft toothbrush. Or both. Isopropyl/etc might be used as well, although it (or even fumes from it) might harm the slot or motherboard component plastics (typically mere surface discolouration rather than serious bulk deformation, but still a risk better avoided). You could meticulously wipe each internal contact with a cleaner-coated wooden toothpick, but it's a tedious exercise in patience which can easily result in bent/damaged pins. You could try dislodging debris with paper edges, although toothbrush bristles tend to work better.

Chamois swabs are preferred over Q-tips and toothbrushes. Lint-free "electronics grade" not-Q-tips are also preferred, as are wooden "electronics grade" not-toothpicks. Even laboratory Kimwipes are preferred over common paper. But the good stuff tends to cost too much for just one little use and we all live in an imperfect world, lol, so just use the best stuff you have on hand which won't make more of a mess than you're trying to clean.


You can try repasting your CPU (removing the H100i cooler, cleaning off the old TIM, applying new TIM, remounting the cooler). I'd recommend using ArctiClean and Arctic Silver 5 - small containers of each (good for a few applications) will cost about $10 at any local computer shop - and (without digressing into a technical discussion about TIMs) they're both excellent products. This repaste wouldn't resolve a bad-mobo or bad-PSU fault, but it would at least correct an overheating-CPU fault (and will improve CPU cooling for the next few years, lol). You can remove and remount the CPU while it's exposed, although it shouldn't be necessary and is as likely to cause problems as it is to fix problems.
"All opinions are not equal. Some are a very great deal more robust, sophisticated and well supported in logic and argument than others." - Douglas Adams

[/Korth]