cancel
Showing results for 
Search instead for 
Did you mean: 

The PC often gets stuck in reboot Loops

martine-dee
Level 12
Hi,

Something that began to happen in August last year is that PC gets stuck in reboot loops when (re)starting. So, it never restarts on its own, but every (re)start might mean a stressful struggle to get it going. I've posted more about it here (vid included):
https://rog.asus.com/forum/showthread.php?113051-Reboots-and-SATA-drives-not-detected-when-AE-9-is-i...

Except, now that I finally got to the point to take the AE-9 out and never use it again, the box still keeps restarting. So, ultimately it is not AE-9 that causes restarting.

Something else is wrong with the box. Currently, it gets frozen in the 00 state every time before it self-restarts.

What do you suggest doing? I am close to the point of just building a whole new PC fwiw. But if I could pin down the culprit, I'd replace it.

Halp? 😕
Keep exploring, keep innovating, keep creating
4,529 Views
22 REPLIES 22

The quickest way to get your laptop out of the infinite boot loop is to force a shut down using the Power button, and then turn it back on again. If that doesn't work for you, perform a Safe mode restart using a boot key specific to your laptop/PC.

Regards,
peter

martine-dee
Level 12
So, here are the newest findings. On Tuesday I disconnected SATA plugs 3-8, and that made the problem go away. No boot-up freezes. No restart loops. That is a good start.

In empirical conclusion, the problem seems linked to SATA devices clashing with the PCIe / M.2 devices. So, that needs a look at. Currently, I use two PCIe slots, and one M.2 slot:

- PCIEX16/X8_1: GTX 1080 Ti Strix
- PCIEX4: Creative AE-9
- M.2_2: WDS100T2X0C (1TB NVMe SSD)

Limitations and settings:
- Because of the physical space, AE-9 can only go to either PCIEX4 or PCIEX16/X8_2.
- The CPU (i7-7820X) has 28 lanes.
- To the BIOS settings, the 'PCIEX4 & SATA6G_5~8 Switch Function' is on the Auto, which means that SATA_78 are out, but SATA_56 should be available still.
- Luckily, M.2_1 is free. Else, it might clash with SATA 1.
- SATA Mode is AHCI
- I already use SATA_12.

This part of the manual seems illiterate (mixing plural and singular, it's probably 'slots', not 'slot'), but let's say it means that possibly SATA 5-8 might be unusable.

> PCIe 3.0/2.0 x4 slot share with SATA_56 ports and SATA_78 ports
> when use device in x4 mode. Adjust BIOS settings to use SATA
> devices.

The need:
- I have three more SATA drives to add. Then one CD/DVD device as well. So, I need four slots beside SATA_12.

Limitation (empirical):
The issues also presented when I used SATA 3/4.

Tactics:
- I will give power but not SATA connection to all drives. If issues reappear, I will revise the tactics.
- I will begin to add the four drives one by one over days. First SATA 3, then 4, and will work my way up to the 5 and 6. If issues reappear, will backtrack.

Ftr, the SATA enumeration starts up, so seeing the SATA slots from the connector side (female), they are:


PCIEX1_1
|
| PCIEX16/X8_1
| |
v v
| | | |
S7 S5 S3 S1
S8 S6 S4 S2
================== (mobo)
Keep exploring, keep innovating, keep creating

martine-dee
Level 12
So far, so good. Though, I am still at the SATA 1-4.

One other thought was, maybe this is about the CPU lanes. So, let's see if that has some merit.

The CPU itself has 28 lanes.

Lane users:
- GTX 1080 Ti Strix, that alone is 16 lanes
- The M.2? Perhaps 4 lanes
- AE-9: Its slot is using 2 lanes.
- DMI 3.0, for PCH-CPU: 4 lanes

That is 26 so far. I wonder if there would be other spenders.

[edit 2020-03-18] Now, this bit is interesting:
https://www.hardwarezone.com.my/feature-heres-all-you-need-know-about-intels-new-x299-platform

It would say that the rest of the things go to the chipset. So, SSDs should not clog the CPU. That is good. But being so close to the lane limit is scary!

[edit 2020-03-20] Meanwhile, I am at stable four drives now. Yippee!! I will keep going.

84314
Keep exploring, keep innovating, keep creating

martine-dee
Level 12
Update: Apr 23, 2020

A relation that I found and confirmed empirically over the last month is that reboot loops only happen on system restart from Unbuntu (16.04). For some reason, the system also sometimes gets stuck (for a long time) when I boot up ubuntu. With nothing OC'd, and that behavior doesn't really depend on the amount of SATA devices attached. Though, I neither need to, nor I dare to touch P7-8.

In any case, as long as I never restart Ubuntu, these reboot loops won't happen. It is manageable. To get to Windows, I shut down Ubuntu and then power up into Windows. I don't know what exactly is broken, where, and how. I've spent some time studying Ubuntu's /var/log/syslog , and couldn't find anything that seemed relevant around those times when it gets stuck forever, without visible usage of disk or other resources.

TL;DR The system is stable, as long as I never reboot from Ubuntu.

When I start building new box, presumably on top of ROG Zenith II Extreme Alpha, I will watch out for the QVLs just in case.
Keep exploring, keep innovating, keep creating

martine-dee wrote:
Update: Apr 23, 2020

A relation that I found and confirmed empirically over the last month is that reboot loops only happen on system restart from Unbuntu (16.04). For some reason, the system also sometimes gets stuck (for a long time) when I boot up ubuntu. With nothing OC'd, and that behavior doesn't really depend on the amount of SATA devices attached. Though, I neither need to, nor I dare to touch P7-8.

In any case, as long as I never restart Ubuntu, these reboot loops won't happen. It is manageable. To get to Windows, I shut down Ubuntu and then power up into Windows. I don't know what exactly is broken, where, and how. I've spent some time studying Ubuntu's /var/log/syslog , and couldn't find anything that seemed relevant around those times when it gets stuck forever, without visible usage of disk or other resources.

TL;DR The system is stable, as long as I never reboot from Ubuntu.

When I start building new box, presumably on top of ROG Zenith II Extreme Alpha, I will watch out for the QVLs just in case.


I had a very similar issue after updating Kubuntu 18.04. After the update when I attempted to reboot I got stuck in a reboot loop. If I shut the computer down, I could reboot to Windows fine, so I went back to Kubuntu to see what packages I had installed during the update. The package that stood out to me was an Intel microcode update. I rolled back the microcode update to the previous version and locked it.
The reboot issues went away! You are using an older version of Ubuntu, 20.04 is now out and I don't think 16.04 is still supported so you might not be able to get the previous version of the microcode so maybe you can just remove the Intel microcode package completely.
It's not the first time I have had problems with Intel microcode updates and it seems to be a matter that it works for some CPU's and not others.

jberkpc wrote:
I had a very similar issue after updating Kubuntu 18.04. After the update when I attempted to reboot I got stuck in a reboot loop. If I shut the computer down, I could reboot to Windows fine, so I went back to Kubuntu to see what packages I had installed during the update. The package that stood out to me was an Intel microcode update. I rolled back the microcode update to the previous version and locked it.
The reboot issues went away! You are using an older version of Ubuntu, 20.04 is now out and I don't think 16.04 is still supported so you might not be able to get the previous version of the microcode so maybe you can just remove the Intel microcode package completely.
It's not the first time I have had problems with Intel microcode updates and it seems to be a matter that it works for some CPU's and not others.


That is an interesting angle worth exploring. Date-wise, my problem did start after the installed version of the microcode was produced.

$ dmesg | grep microcode
[ 0.000000] microcode: microcode updated early to revision 0x2000064, date = 2019-07-31
[ 0.943145] microcode: sig=0x50654, pf=0x4, revision=0x2000064
[ 0.943269] microcode: Microcode Update Driver: v2.2.


$ apt list | grep microcode
amd64-microcode/xenial-updates,xenial-security,now 3.20191021.1+really3.20180524.1~ubuntu0.16.04.2 amd64 [installed,automatic]
intel-microcode/xenial-updates,xenial-security,now 3.20191115.1ubuntu0.16.04.2 amd64 [installed]
microcode.ctl/xenial 1.18~0+nmu2 amd64


However, I don't have the option to use it or not. Is it possible that the microcode is unusable with my system anyway, so it is not really used?

84917

BIOS-wise, now I went back to the beginning, and will see what that brings. Do you see anything suspicious in this list?

84918
Keep exploring, keep innovating, keep creating

Latest Asus bios's for x299 mainboards (3006 for example in Omega board) solved the microcode error trying to boot operating systems.

So in case u are not using the latest version with updated parts, u should uninstall those Linux updates which usually go with Linux Kernel.
Asus x670e Hero, 7950x, TG 7000, Master 4090, 970 PRO, 860 EVO, Intel 750,
SN850, SN750, 840 Pro, 2xSkyhawk 4TB, Pioneer S12U
EVGA T2 1600, Silverstone TJ11, Custom LC, Acer Predator x35, Philips OLED

The issues resurfaced, particularly linked to rebooting or even shutting down from Ubuntu 16.04 and immediately starting again.

vmanuelgm wrote:
Latest Asus bios's for x299 mainboards (3006 for example in Omega board) solved the microcode error trying to boot operating systems.

So in case u are not using the latest version with updated parts, u should uninstall those Linux updates which usually go with Linux Kernel.


See, I counted on the ezUpdate to pick it up. Else, I was riding on 2018 versions of drivers and stuff. Now I updated everything, and also see that the Turbo 3.0 thing is gone.
https://rog.asus.com/forum/showthread.php?116383-Say-goodbye-for-Intel-Turbo-Boost-MAX-3-0-Driver-du...

I suppose I will see how all that fares. I also downgraded and disabled updating the intel-microcode package, so I won't know which of them fixed the issues, if they are indeed fixed.

martine@Defiant:~$ sudo apt-cache madison intel-microcode
[sudo] password for martine:
intel-microcode | 3.20200609.0ubuntu0.16.04.1 | http://nl.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
intel-microcode | 3.20200609.0ubuntu0.16.04.1 | http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
intel-microcode | 3.20151106.1 | http://nl.archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages

martine@Defiant:~$ sudo apt install intel-microcode=3.20151106.1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be DOWNGRADED:
intel-microcode
(...)

martine@Defiant:~$ sudo apt-mark hold intel-microcode
intel-microcode set on hold.

martine@Defiant:~$
Keep exploring, keep innovating, keep creating

martine-dee wrote:
That is an interesting angle worth exploring. Date-wise, my problem did start after the installed version of the microcode was produced.

$ dmesg | grep microcode
[ 0.000000] microcode: microcode updated early to revision 0x2000064, date = 2019-07-31
[ 0.943145] microcode: sig=0x50654, pf=0x4, revision=0x2000064
[ 0.943269] microcode: Microcode Update Driver: v2.2.


$ apt list | grep microcode
amd64-microcode/xenial-updates,xenial-security,now 3.20191021.1+really3.20180524.1~ubuntu0.16.04.2 amd64 [installed,automatic]
intel-microcode/xenial-updates,xenial-security,now 3.20191115.1ubuntu0.16.04.2 amd64 [installed]
microcode.ctl/xenial 1.18~0+nmu2 amd64


However, I don't have the option to use it or not. Is it possible that the microcode is unusable with my system anyway, so it is not really used?

84917

BIOS-wise, now I went back to the beginning, and will see what that brings. Do you see anything suspicious in this list?

84918

It looks like you are using the unattended-upgrade package which automatically updates any security packages so if you roll back the Intel microcode to the previous version it will probably just update it again.
I uninstall that package when I install Kubuntu. because I update 5 days a week. and want more control of updates .
According to this you can pass a kernel command line parameter that disables the microcode updates but that's above my pay grade
https://wiki.debian.org/Microcode#Working_around_boot_problems_caused_by_microcode_updates.
I t looks like I was wrong about updates to 16.04, it looks like it's supported until some time in 2021.
All I know is rolling back to the previous version on the microcode solved my problems which are just like yours and I've updated my Kubuntu install many times including numerous kernel updates since without issue.
I'm running a 10920X with the RE6E motherboard.
Best of Luck.

jberkpc wrote:
It looks like you are using the unattended-upgrade package which automatically updates any security packages so if you roll back the Intel microcode to the previous version it will probably just update it again.
I uninstall that package when I install Kubuntu. because I update 5 days a week. and want more control of updates .
According to this you can pass a kernel command line parameter that disables the microcode updates but that's above my pay grade
https://wiki.debian.org/Microcode#Working_around_boot_problems_caused_by_microcode_updates.
I t looks like I was wrong about updates to 16.04, it looks like it's supported until some time in 2021.
All I know is rolling back to the previous version on the microcode solved my problems which are just like yours and I've updated my Kubuntu install many times including numerous kernel updates since without issue.
I'm running a 10920X with the RE6E motherboard.
Best of Luck.

Thanks, I can certainly do that. This means that adding dis_ucode_ldr to the boot loader should disable the microcode loader.

{ source }
dis_ucode_ldr [X86] Disable the microcode loader.


On my instance, that is like running:
$ sudo vim /etc/default/grub


, finding
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"


turning it into
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset dis_ucode_ldr"


, running
$ sudo update-grub


Since the BIOS roll-back, I've returned the XMP profile, disabled the HT, set explicit link speed for the video card. Things have been stable since then, and I gave everything hard test by restarting when I need to. That doesn't mean that this change is senseless. This change could be making it possible to run the PC with all the BIOS changes that were rolled back.
Keep exploring, keep innovating, keep creating