08-29-2024 06:35 PM - edited 08-29-2024 06:38 PM
I have a ROG Strix Z690-I mini-ITX board with an Intel 13900K and a discrete GPU (3080Ti) and I always had the onboard GPU explicitly enabled in addition to the discrete one ("iGPU Multi-Monitor" feature) with both Nvidia and Intel graphics drivers installed under Windows. Today I noticed that there is no Intel graphics in the device manager anymore, which wasn't fixed by a reboot. In the BIOS the iGPU options (like iGPU Multi-Monitor and CPU graphics as a primary display option) also disappeared as if my CPU suddenly changed its gender from K to KF.
I have the latest BIOS with the microcode update (3802) since when it came out and never had CPU or iGPU problems until now. The CPU performance seems fine, no sign of the dreaded Intel degradation, but the iGPU is suddenly not detected anymore. Is this a known issue with the CPU or board and is there anything I can do to fix it? The iGPU is useful for encoding and as a fallback when the Nvidia GPU acts up.
09-01-2024 08:15 AM
Hi @Novgorod have you tried a CMOS Reset? A power surge might have corrupted something (just a guess).
09-01-2024 12:15 PM
The iGPU is enumerated at each boot like any other hardware and there is no CMOS flag (to my knowledge) to completely remove it from the device list, only the multi-monitor option to disable it if a discrete GPU is present, but that option is only available if the iGPU is detected in the first place. I suppose I could reflash the BIOS ROM if that got somehow corrupted.
But regardless, the iGPU reappeared again after several reboots a few days later! Right when the (white) VGA QLED on the mainboard came on, the board restarted the boot process with RAM training (long stuck on RAM LED) which happens when new hardware is detected, and the iGPU was back all by itself. So now it looks like a physical problem with the mainboard, some bad CPU trace or some other subtle thing impossible to hunt down... ☹️
09-01-2024 05:35 PM
Hi @Novgorod I am glad to hear that for the moment the issue is resolved. Did you install any custom CPU frame like a Thermal Grizzly Contact Frame to hold the CPU together or are you using the stock CPU holder. What type of CPU cooler do you have?
Besides board's hardware issue, it is also possible that some extra pressure on CPU pins could have caused this issue and with the machine going through its heat and cold cycles (as it gets used or not) the CPU pins that were not responding starting to connect properly. This is just a pure guess. I would not open the CPU socket and look for anything as long as things are all working properly.
On the other hand, if this iGPU comes and goes then I recommend that you open the socket and visually inspect if the CPU pins all look good and nothing visually looks damaged.
It is also possible that the CPU has degraded and lost its iGPU in which case RMA with Intel will be your course of action provided you can take it some other machine or a repair shop to get a proper test/confirmation. Or try your luck with Intel directly explaining your issue. They may approve your RMA.
09-01-2024 05:50 PM
@achugh Indeed I use a Thermal Grizzly contact frame (installed to spec), and the cooler is an EK "full-cover" waterblock made specifically for the Z690-I. The setup is almost 2 years old and I never did anything to the CPU or waterblock mount after the initial assembly, so it's a bit strange that the problem would only appear after that much time, but I suppose it's possible to get fatigue on stressed pins or traces through thermal cycling. Is this a known "long-term" issue especially with a 13th-gen contact frame? I thought if you screwed that up, it would immediately break or be unstable, but maybe I'm just especially lucky...
I think I have to test it with a replacement board and if it doesn't solve it, I can still return it (but if it does, then I have an expensive brick with an intermittent error)...
09-01-2024 09:15 PM
Hi @Novgorod thank you for sharing this additional information. This type of issue or stress is not reported or confirmed anywhere that I know about. Technically, anything can happen as you said above. I am also speculating or guess just to brainstorm the reasons and ideas with you to help you out as sometimes we all get better ideas talking to someone else.
Like I said above, we should follow the simple principle of "Don't try to fix something that is not broken" i.e. if things are back to a working order, I would not touch anything at all. Let the your system live the life it has. Once it goes bad then you have to do something about it. Right now we don't have a solid indications that anything is really wrong or this problem will every appear again. So enjoy your system while it lasts.
09-02-2024 04:09 AM
Hi @Novgorod
1. Does the CPU exhibit issues with the iGPU function with the discrete GPU removed?
2. Have you updated to the latest Intel drivers?
3. As Achugh has alluded to, it's possible due to thermal cycling that the CPU pin-pad contact may cause the system to exhibit strange behaviour over time. It may be worth reseating the CPU and inspecting the socket and CPU pin pad. Ensure consistent mounting pressure.
4. Consider reinstalling Windows, or install a dummy OS on a separate storage device to test if the behaviour remains the same.
09-02-2024 08:57 PM
1. Yes. When it first happened, I tried to boot without the discrete GPU - the VGA QLED stayed on and the system booted without any GPU (could only use remote desktop to log into the PC, since other remote tools like Anydesk or Rustdesk grab the image from the GPU).
2. (and 4.) Yes, always used the latest drivers, but it's definitely not an OS issue because the iGPU was not detected in the BIOS already. The BIOS tells you (indirectly but certainly) whether an iGPU was detected in the device list.
3. That's the most likely case and I'll definitely reseat the CPU and check for physical issues (though not immediately because it's a compact watercooled SFF build and I'll have to take the whole system apart). So far the problem appeared just once for a few days and then fixed itself (without any system or software changes). I'll also try to reproduce it by mechanically stressing the board around the CPU, let's see..