I have a standalone DC-Controler with the latest UCS running on bare-metal.
After an accidental power loss, the server boots only into the BusyBox:
ALERT! /dev/mapper/vg_ucs-root does not exist
initramfs has very low possibilities to check for the problem (no fdisk, no lsblk) and using the lvm-commands (pvscan vgscan vscan) gives no result like there is no LVM at all. So perhaps the disk was damaged?
But then I boot the server with a Live-Linux (“System-Rescue” from www.system-rescue.org) and there I can see the missing volume group and after a lot of testing I think it’s OK. I checked everything I found in the internet and there seems to be no errors. I could even mount the VG and browse through the directories and files - found nothing conspicuous.
“System-Rescue” even has an option “Boot a Linux operating system installed on the disk (findroot)” and with this I was able to boot the UCS-System. I can log-in local, start the management-console or ssh-session from a client, the mail-server is running and I can even find /dev/mapper/vg_ucs-root. The build-in system check throws no errors.
Unfortunately this boot-option seems to be unstable - after a few minutes UCS totaly freezes and I have to cut the power.
I’ve searched a lot and try a lot of possible solutions but had no luck so far. Now I’m running out of options and will appreciate any suggestions what to do next.
Are there no build-in rescue options like reconfiguring the LVM settings or the boot enviroment? Should I try to re-install or repair the lvm2-installation and how to do that?
Thanks for reading and hopefully waiting for help.
But one problem is left:
While booting I have to choose “Advanced Options” and select the kernel 4.19.0-27-amd64.
I think this is the right one for my UCS 5.0-8 errata 1085?
By default a kernel 5.10.0-0.deb10.30-amd64 will be loaded.
How can I define the 4.19-kernel as default and remove the 5.10-kernel from the system?
I think the 5.10-kernel comes from the Live-Linux I used to fix the initially problem (mounting the UCS-installation, chroot and update-initramfs).
the 5.10 kernel is right, we just released that via erratum 1081. 4.19 was the previous version and therefore is still available, but for security reasons you should switch to the latest kernel released by us if possible.
Hallo,
both commands bring up a huge list showing the content of the corresponding images.
Do you want me to post it? Shall I look for something special?
Hallo Jan-Luca,
good to know that the 5.10 kernel is right. I searched a lot but could not find the information.
So then I have the issue that USC will not boot with this kernel (my initial problem).
I try to build a new initramfs:
root@ucs:~# update-initramfs -u -v -k 5.10.0-0.deb10.30
update-initramfs: Generating /boot/initrd.img-5.10.0-0.deb10.30
W: missing /lib/modules/5.10.0-0.deb10.30
W: Ensure all necessary drivers are built into the linux image!
depmod: ERROR: could not open directory /lib/modules/5.10.0-0.deb10.30: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
cat: /var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30/modules.builtin: Datei oder Verzeichnis nicht gefunden
find: ‘/var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30/kernel’: Datei oder Verzeichnis nicht gefunden
[snipping normal stuff]
/usr/share/initramfs-tools/scripts/init-bottom/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/local-block/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/local-top/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/init-premount/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/local-premount/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/init-top/ORDER ignored: not executable
/usr/share/initramfs-tools/scripts/panic/ORDER ignored: not executable
depmod: WARNING: could not open modules.order at /var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30: No such file or directory
depmod: WARNING: could not open modules.builtin at /var/tmp/mkinitramfs_2XhWKy/lib/modules/5.10.0-0.deb10.30: No such file or directory
Building cpio /boot/initrd.img-5.10.0-0.deb10.30.new initramfs
It seems that the image is damaged?
How can I repair the kernel?
root@ucs:~# dpkg --list | grep linux-image
ii linux-image-4.19.0-23-amd64 4.19.269-1 amd64 Linux 4.19 for 64-bit PCs (signed)
rc linux-image-4.19.0-24-amd64 4.19.282-1 amd64 Linux 4.19 for 64-bit PCs (signed)
ii linux-image-4.19.0-25-amd64 4.19.289-2 amd64 Linux 4.19 for 64-bit PCs (signed)
ii linux-image-4.19.0-26-amd64 4.19.304-1 amd64 Linux 4.19 for 64-bit PCs (signed)
ii linux-image-4.19.0-27-amd64 4.19.316-1 amd64 Linux 4.19 for 64-bit PCs (signed)
ii linux-image-5.10-amd64 5.10.218-1~deb10u1 amd64 Linux for 64-bit PCs (meta-package)
ii linux-image-5.10.0-0.deb10.30-amd64 5.10.218-1~deb10u1 amd64 Linux 5.10 for 64-bit PCs (signed)
ii linux-image-amd64 4.19+105+deb10u22 amd64 Linux for 64-bit PCs (meta-package)
root@ucs:~# uname -r
4.19.0-27-amd64
I’m not sure what is the right one for my case:
linux-image-5.10-amd64
linux-image-5.10.0-0.deb10.30-amd64
My idea is now to remove both images and re-install the right one (or both, if needed):
apt-get purge linux-image-5.10-amd64
apt-get purge linux-image-5.10.0-0.deb10.30-amd64
apt install --install-recommends linux-image-5.10.0-0.deb10.30-amd64 AND / OR linux-image-5.10-amd64
Is it right to do that?
What to do after re-installing the kernel?
Do I have to run update-initramfs (options?), update-grub and grub-install?
Do I have to clean-up the old configurations (like the initrd.-files in /boot) or will apt-get purge do that?
Sorry for the many questions but I’m not really experienced with this and don’t want to mess up the system!
So setting a specific kernel has to be the first option?
root@ucs:~# update-initramfs -k 5.10.0-0.deb10.30-amd64 -u
update-initramfs: Generating /boot/initrd.img-5.10.0-0.deb10.30-amd64
W: Possible missing firmware /lib/firmware/tigon/tg3_tso5.bin for module tg3
W: Possible missing firmware /lib/firmware/tigon/tg3_tso.bin for module tg3
W: Possible missing firmware /lib/firmware/tigon/tg357766.bin for module tg3
W: Possible missing firmware /lib/firmware/tigon/tg3.bin for module tg3
Searching around the missing tg3-module this seems to be unproblematic.
I try this befor and again after installing the related kernel headers (as @externa1 suggested) but nothing changes:
With the aktual 5.10 kernel UCS will not boot.
Gave up waiting for root file system device. Common problems:
- Boot args (cat /proc/cmdline)
- Check rootdelay= (did the system wait long enough?)
- Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/mapper/vg_ucs-root does not exist. Dropping to a shell!
BusyBox v1.30.1 (Debian 1:1.30.1-4) built-in shell (ash)
(initramfs)
Is there a way to (re-)install the 5.10 kernel completely new?
I do not find “kernel” entrys or “cipo: Vorzeitiges Ende des Archivs” or anything suspicious…
(Try to post the result of lsinitramfs /boot/initrd.img-5.10.0-0.deb10.30-amd64 but it’s too much text)
That was my first thought when the server didn’t come up after the power failure.
But how can it be a hardware defect if the system boots and runs perfectly with the old kernel?
The hardware is a DELL PowerEdge T140 with a PERC H330 RAID-Controller and two WD-RED HDs at RAID1. Running for years now without the need of any special drivers or firmware. The integrated maintenance sytem (iDRAC9) shows no errors or problems.
I still think it’s a software problem but what could I do to find out whether it is the hardware or not?
First of all let me tell you: Thank you so much for caring and finding the clue! You are a real genius!
In the Advanced Boot-Options I added intel_iommu=off and UCS booted with the 5.10 kernel without any problems.
WTF: I thought the accidental power loss was the reason and damaged something but in reality this only led to the server being booded with the new kernel for the first time after the update. Double bad luck I guess!
OK, last question left:
How can I make this boot option permanently?
And, additional, will this survive further updates, will this make trouble in the future and is there a chance that Univention will fix this in a “official way”? This is probably a question for the Univention staff @jlk ?
nano /etc/default/grub
and add it at the end of the line GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=off"
Save and exit nano and update-grub
Unfortunately /etc/default/grub ist autogenerated so the added parameter will not survive further updates.
So far I understand the informations found to this bug it could be a solution to switch to UEFI-Boot?
But changing this will be a bigger operation (reduce the LVM for needing an additional partition / finding a good step-by-step How-To for UCS)?