/dev/mapper/vg_ucs-root does not exist - Server boots into BusyBox

DHB · July 16, 2024, 10:07pm

Hi to all,

I have a standalone DC-Controler with the latest UCS running on bare-metal.
After an accidental power loss, the server boots only into the BusyBox:
ALERT! /dev/mapper/vg_ucs-root does not exist

initramfs has very low possibilities to check for the problem (no fdisk, no lsblk) and using the lvm-commands (pvscan vgscan vscan) gives no result like there is no LVM at all. So perhaps the disk was damaged?

But then I boot the server with a Live-Linux (“System-Rescue” from www.system-rescue.org) and there I can see the missing volume group and after a lot of testing I think it’s OK. I checked everything I found in the internet and there seems to be no errors. I could even mount the VG and browse through the directories and files - found nothing conspicuous.
“System-Rescue” even has an option “Boot a Linux operating system installed on the disk (findroot)” and with this I was able to boot the UCS-System. I can log-in local, start the management-console or ssh-session from a client, the mail-server is running and I can even find /dev/mapper/vg_ucs-root. The build-in system check throws no errors.
Unfortunately this boot-option seems to be unstable - after a few minutes UCS totaly freezes and I have to cut the power.

I’ve searched a lot and try a lot of possible solutions but had no luck so far. Now I’m running out of options and will appreciate any suggestions what to do next.
Are there no build-in rescue options like reconfiguring the LVM settings or the boot enviroment? Should I try to re-install or repair the lvm2-installation and how to do that?

Thanks for reading and hopefully waiting for help.

Greetings
Dirk

DHB · July 17, 2024, 6:45pm

Hi again,

at least I made it and the server is running now.

But one problem is left:
While booting I have to choose “Advanced Options” and select the kernel 4.19.0-27-amd64.
I think this is the right one for my UCS 5.0-8 errata 1085?

By default a kernel 5.10.0-0.deb10.30-amd64 will be loaded.
How can I define the 4.19-kernel as default and remove the 5.10-kernel from the system?

I think the 5.10-kernel comes from the Live-Linux I used to fix the initially problem (mounting the UCS-installation, chroot and update-initramfs).

TIA & greetings
Dirk

SirTux · July 17, 2024, 9:23pm

Can you check the initrd files?

lsinitramfs /boot/initrd.img-5.10.0-0.deb10.30-amd64
lsinitramfs /boot/initrd.img-4.19.0-27-amd64

jlk · July 18, 2024, 8:11am

Moin,

the 5.10 kernel is right, we just released that via erratum 1081. 4.19 was the previous version and therefore is still available, but for security reasons you should switch to the latest kernel released by us if possible.

Regards
Jan-Luca

DHB · July 18, 2024, 8:22am

Hallo,
both commands bring up a huge list showing the content of the corresponding images.
Do you want me to post it? Shall I look for something special?

Greetings
Dirk

DHB · July 18, 2024, 9:25am

Hallo Jan-Luca,
good to know that the 5.10 kernel is right. I searched a lot but could not find the information.
So then I have the issue that USC will not boot with this kernel (my initial problem).

I try to build a new initramfs:

root@ucs:~# update-initramfs -u -v -k 5.10.0-0.deb10.30
update-initramfs: Generating /boot/initrd.img-5.10.0-0.deb10.30
W: missing /lib/modules/5.10.0-0.deb10.30
W: Ensure all necessary drivers are built into the linux image!
depmod: ERROR: could not open directory /lib/modules/5.10.0-0.deb10.30: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
cat: /var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30/modules.builtin: Datei oder Verzeichnis nicht gefunden
find: ‘/var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30/kernel’: Datei oder Verzeichnis nicht gefunden
[snipping normal stuff]
/usr/share/initramfs-tools/scripts/init-bottom/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/local-block/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/local-top/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/init-premount/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/local-premount/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/init-top/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/panic/ORDER ignored: not executable
depmod: WARNING: could not open modules.order at /var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30: No such file or directory
depmod: WARNING: could not open modules.builtin at /var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30: No such file or directory
Building cpio /boot/initrd.img-5.10.0-0.deb10.30.new initramfs

It seems that the image is damaged?
How can I repair the kernel?

TIA & greetings
Dirk

DHB · July 18, 2024, 1:37pm

Some more info:

root@ucs:~# dpkg --list | grep linux-image
ii  linux-image-4.19.0-23-amd64                         4.19.269-1                                        amd64        Linux 4.19 for 64-bit PCs (signed)
rc  linux-image-4.19.0-24-amd64                         4.19.282-1                                        amd64        Linux 4.19 for 64-bit PCs (signed)
ii  linux-image-4.19.0-25-amd64                         4.19.289-2                                        amd64        Linux 4.19 for 64-bit PCs (signed)
ii  linux-image-4.19.0-26-amd64                         4.19.304-1                                        amd64        Linux 4.19 for 64-bit PCs (signed)
ii  linux-image-4.19.0-27-amd64                         4.19.316-1                                        amd64        Linux 4.19 for 64-bit PCs (signed)
ii  linux-image-5.10-amd64                              5.10.218-1~deb10u1                                amd64        Linux for 64-bit PCs (meta-package)
ii  linux-image-5.10.0-0.deb10.30-amd64                 5.10.218-1~deb10u1                                amd64        Linux 5.10 for 64-bit PCs (signed)
ii  linux-image-amd64                                   4.19+105+deb10u22                                 amd64        Linux for 64-bit PCs (meta-package)

root@ucs:~# uname -r
4.19.0-27-amd64

I’m not sure what is the right one for my case:

linux-image-5.10-amd64
linux-image-5.10.0-0.deb10.30-amd64

My idea is now to remove both images and re-install the right one (or both, if needed):

apt-get purge linux-image-5.10-amd64
apt-get purge linux-image-5.10.0-0.deb10.30-amd64
apt install --install-recommends linux-image-5.10.0-0.deb10.30-amd64 AND / OR linux-image-5.10-amd64

Is it right to do that?
What to do after re-installing the kernel?
Do I have to run update-initramfs (options?), update-grub and grub-install?
Do I have to clean-up the old configurations (like the initrd.-files in /boot) or will apt-get purge do that?

Sorry for the many questions but I’m not really experienced with this and don’t want to mess up the system!

externa1 · July 18, 2024, 4:01pm

I guess you have to install the kernel headers for the 5.10 kernel
apt install linux-headers-5.10.0-0.deb10.30-amd64

regards
Christian

SirTux · July 18, 2024, 4:37pm

The right command would be update-initramfs -k 5.10.0-0.deb10.30-amd64 -u

SirTux · July 18, 2024, 4:40pm

No I was looking for something like this:

# lsinitramfs /boot/initrd.img-4.19.0-27-amd64  
kernel
kernel/x86
kernel/x86/microcode
kernel/x86/microcode/.enuineIntel.align.0123456789abc
kernel/x86/microcode/GenuineIntel.bin
cpio: Vorzeitiges Ende des Archivs

DHB · July 19, 2024, 10:36am

So setting a specific kernel has to be the first option?

root@ucs:~# update-initramfs -k 5.10.0-0.deb10.30-amd64 -u
update-initramfs: Generating /boot/initrd.img-5.10.0-0.deb10.30-amd64
W: Possible missing firmware /lib/firmware/tigon/tg3_tso5.bin for module tg3
W: Possible missing firmware /lib/firmware/tigon/tg3_tso.bin for module tg3
W: Possible missing firmware /lib/firmware/tigon/tg357766.bin for module tg3
W: Possible missing firmware /lib/firmware/tigon/tg3.bin for module tg3

Searching around the missing tg3-module this seems to be unproblematic.
I try this befor and again after installing the related kernel headers (as @externa1 suggested) but nothing changes:

With the aktual 5.10 kernel UCS will not boot.

Gave up waiting for root file system device. Common problems:
- Boot args (cat /proc/cmdline)
  - Check rootdelay= (did the system wait long enough?)
- Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/mapper/vg_ucs-root does not exist. Dropping to a shell!

BusyBox v1.30.1 (Debian 1:1.30.1-4) built-in shell (ash)

(initramfs)

Is there a way to (re-)install the 5.10 kernel completely new?

SirTux · July 19, 2024, 10:54am

univention-install --reinstall linux-image-5.10.0-0.deb10.30-amd64

DHB · July 19, 2024, 10:54am

I do not find “kernel” entrys or “cipo: Vorzeitiges Ende des Archivs” or anything suspicious…
(Try to post the result of lsinitramfs /boot/initrd.img-5.10.0-0.deb10.30-amd64 but it’s too much text)

DHB · July 19, 2024, 11:31am

Done it and reboot. Same as before.

I try the suggested checks (Common problems):

(initramfs) cat /proc/cmdline
BOOT-IMAGE=/vmlinuz-5.10.0-0.deb10.30-amd64 root=/dev/mapper/vg_ucs-root ro quiet loglevel=0 rootdelay=5 splash apparmor=0

The rest is too much to tap it and there is no way to copy&paste so I attach a photo:
screenshot

I’m really running out of ideas now and would be thankful for any hint!

SirTux · July 19, 2024, 11:42am

Then you should check the hardware support. Do you have a hardware RAID? In any case I would check the controller.

DHB · July 19, 2024, 12:05pm

That was my first thought when the server didn’t come up after the power failure.

But how can it be a hardware defect if the system boots and runs perfectly with the old kernel?

The hardware is a DELL PowerEdge T140 with a PERC H330 RAID-Controller and two WD-RED HDs at RAID1. Running for years now without the need of any special drivers or firmware. The integrated maintenance sytem (iDRAC9) shows no errors or problems.

I still think it’s a software problem but what could I do to find out whether it is the hardware or not?

SirTux · July 19, 2024, 12:31pm

https://groups.google.com/g/linux.debian.kernel/c/Cg9y_qjJ4AY

EDIT:

I can confirm that using intel_iommu=off allows the host to boot without
issues.

Source: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=992304

DHB · July 19, 2024, 3:19pm

First of all let me tell you:
Thank you so much for caring and finding the clue! You are a real genius!
In the Advanced Boot-Options I added intel_iommu=off and UCS booted with the 5.10 kernel without any problems.
WTF: I thought the accidental power loss was the reason and damaged something but in reality this only led to the server being booded with the new kernel for the first time after the update. Double bad luck I guess!

OK, last question left:
How can I make this boot option permanently?

And, additional, will this survive further updates, will this make trouble in the future and is there a chance that Univention will fix this in a “official way”? This is probably a question for the Univention staff @jlk ?

DHB · July 19, 2024, 3:43pm

I found a way with

nano /etc/default/grub
and add it at the end of the line
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=off"
Save and exit nano and update-grub

Unfortunately /etc/default/grub ist autogenerated so the added parameter will not survive further updates.

So far I understand the informations found to this bug it could be a solution to switch to UEFI-Boot?
But changing this will be a bigger operation (reduce the LVM for needing an additional partition / finding a good step-by-step How-To for UCS)?

SirTux · July 20, 2024, 9:05am

This should do the job

ucr get grub/append 
ucr set grub/append='$(ucr get grub/append) intel_iommu=off'
ucr get grub/append