UCS 5.0-6 inside Proxmox KVM/QEMU dead processes, started by jbd2/sdx dead process

Hi,

I evaluate UCS for a while and in the past few weeks I found a strange issue with it.
Let me describe my setup:

Physical server

Proxmox 8.1.3 environment on ZFS, UCS on a KVM machine
The KVM machine parameters:

  • 6GB RAM (balloon=0), CPU: 8 (host)
  • BIOS: OVMF (UEFI); Machine: i440fx; SCSI controller: VirtIO SCSI single
  • 7 disks (zfs vols), separated disks for different purposes, common setup: cache=writeback, discard=on, iothread=1, ssd=1, async_io=io_uring

Relevant regular task on proxmox:
automatic guest snapshot using proxmox own snapshot tool (not only a zfs snap), every hour

UCS KVM guest fstab

The UCS 5.0-6 installation seems nothing special, lvm not used, different functions on different disks/partitions:

/  ext4  errors=remount-ro,user_xattr
/boot/efi  vfat  umask=0077
/home  ext4  discard,noatime,user_xattr,usrquota
/var/flexshares  ext4  discard,noatime,user_xattr
/var/lib/univention-ldap  ext4  discard,noatime,user_xattr
/var/log  ext4  discard,noatime,user_xattr
/var/univention-backup  ext4  discard,noatime,user_xattr
16GB swap partition
  • qemu-guest-agent was not installed
  • when I installed qemu-guest-agent, the system crashed more seriously, almost every time completely unresponsive, seemed the malfunctioning started around the on fsfreeze command (used for proxmox snapshots, which should take maximum some seconds or less)

Symptoms

  • After several hours (20-48) the system become slow or unresponsive.
  • I even found the loads like 305/305/305 while the cpu was only 0.47% used
  • Several dead processes: in one case the systemd-journald, in an other case: slapd etc., could be different
  • Common dead processes: jbd2/sdx : with different disks, seems random, could be the root cause of all the other issues, example (they happened different days, just example, sde is the ldap, while the sdc is the log partitions, that is why journald or slapd become dead):
ucs-6743 kernel: [29967.458782] jbd2/sde1-8     D    0   514      2 0x80000000
....
ucs-6743 kernel: [153095.267685] jbd2/sdc1-8     D    0   488      2 0x80000000

Assumptions

  • In some previous attempts, when I installed qemu-guest-agent, it seemed the fsfreeze triggered the problem, at least I did find after UCS reset in the logs.
  • Since I installed UCS without qemu-guest-agent, this problem still exists, but I am able login sometimes or I can use the already existing ssh connection to check the internal status of the server. Anyway, sometimes accessing the filesystem not possible, due to the jbd2/sd* issue.
  • So, at this point I think, the jbd2/sd* issue starts a chain reaction
    → random processes will be dead,
    → which cause various hangs/malfunctions/unresponsive system

Now I start to test with the following changes (one at a time):

  • UCS filesystem mount options: I removed the discard from all the filesystem previously I used (see above).
  • changing something in the proxmox level: async_io, discard, etc.

My question, do you have any clue, what could be going on? What did I miss?
Any hints are welcome :slight_smile:

Thanks,
István

We are running here also a lot of UCS5 VM’s on Proxmox 8.1. All with qemu-guest agent. No problems. Here are one of my guestconfig:

agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 32
cpu: host
description: 
efidisk0: SSD-local-zfs:vm-112-disk-1,size=1M
hotplug: disk,network,usb,memory,cpu
machine: q35
memory: 4096
name: dc1.tux.lan
net0: virtio=2A:91:44:FA:FA:A5,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: SSD-local-zfs:vm-112-disk-2,discard=on,iothread=1,size=20G,ssd=1
scsi1: SSD-local-zfs:vm-112-disk-0,discard=on,iothread=1,size=8G,ssd=1
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=
sockets: 1
startup: order=1,up=30
tags:  
vcpus: 4

It is working also with default cpu. The 2 disks are Root and Swap. Nothing more.

Even if it doesn’t help in the search for the problem, I would like to give some feedback.

We operate several PVEs. From an old 6.4.15 to the current 8.x. UCS 4.x - 5.x run on all of them. All with qemu-guest agent installed. I have not noticed any problems here.

Thank you for your feedback!

As I can see, there could be some differences:

  • q35 vs i440fx (as I remember, q35 also died before, I try to check)
  • proxmox filesystem: zfs (in my case) vs. anything else
  • proxmox: snapshots vs backup
  • numbers of disks and used partitions in UCS
  • probably lvm vs no-lvm (in my case) in UCS
  • mount options in UCS

It would be nice to know the following in your use cases:

  • proxmox volume manager used:
    • ( ) zfs
    • ( ) lvm
  • proxmox snapshots:
    • [ ] autosnap
    • [ ] pure zfs snapshot
    • [ ] no snapshot
    • other: ___________
  • Filesystem/vol manager used in UCS
    • [ ] lvm used
    • [ ] no lvm
    • [ ] ext4
    • [ ] xfs
  • UCS filesystem mount options
    • [ ] discard
    • [ ] noatime
    • [ ] relatime
    • other: ____

Otherwise I have no idea, why my setups are dying painfully, while a lot of other guests are running happily without any issue.
Additionaly, when I just installed the very first instance of UCS in dumb mode (one drive, automatic filesystem management, just click-click-click etc.), it was working for months and than I decided to install a production candidate and remove this messy test instance. My horror story started that time (of course, I did not backup that VM of a test machine, which considered an expandable one).

Thanks for playing.

proxmox volume manager used:
(x ) zfs
( ) lvm
proxmox snapshots:
[ ] autosnap
[ x] pure zfs snapshot
[ ] no snapshot
other: ___________
Filesystem/vol manager used in UCS
[ ] lvm used
[x ] no lvm
[ x] ext4
[ ] xfs
UCS filesystem mount options

Systemdefaults: errors=remount-ro,user_xattr

Cool, thank you for your feedback!
I try to narrow the gap between our configurations.

Sidenote:
autosnap in Proxmox produce snapshots, which appear on the webGUI and besides of zfs snapshot, it handles other things as proxmox own snapshot does: config snapshot, fsfreeze via qemu-guest-config etc.
This could be a serious difference, as probably using native zfs snapshot never triggers any fsfreeze on the guest, even you use qemu-guest-agent.
At least, as I checked the logs, it seems the issue starts around the time the snapshots happen (cron: 5 * * * *).
Discard also suspicious, but I have less proof about it than autosnap.

There is a good chance, it is somehow proxmox/qemu/general problem related, it is still under investigation:

and other links with similar symptoms.

How do you generate your snapshot? Command?

I use proxmox-autosnap, something like this:

proxmox-autosnap.py --autosnap --vmid all --label hourly --keep 23 --mute

In other words, this snapshot is not zfs snap -r rpool@hourly-2024-01-09-1705, it will use the qm snapshot <vmid> <snapname> [OPTIONS].

A pure (native) zfs snapshot has nothing to do with the kvm or lxc, only cares about the underlying zfs volume/filesystem. In this case the kvm/lxc does not know about any snapshots or any kind of actions, totally invisible for the guests → absolutely zero downtime.

In contrast, the qemu/lxc proxmox way of snapshot is a complex action and you cannot avoid it, when you make a backup with vzdump or using PBS (probably once per day).
The autosnap is working like the proxmox original snapshot utility, which used the qm/pct commands.

As I run autosnap hourly, I run into this risky situation 24 times a day, while the same amount of risky situation takes about 24 days for a standard user, who is using daily backups.

I also run UCS on a Proxmox 8.1.3 server with no issues. I set mine up with I think all the defaults for a VM except I chose host for CPU. Using LVM for the vm disk, no cache, discard is off, iothread is off, ssd is off, async_io=io_uring (default).

I also use the guest agent. I have automatic backups at night which does use fs_freeze and thaw, but I only do snapshots as needed when I upgrade or make changes. Only time I’ve had problems with backups is when my backup drive was failing. I use a 2.5" laptop drive for backups because it fits in the server and I somehow end up with a bunch of them. They tend to fail every year or so.

I think thats your broblem - try to use the default nocache setting

rg
Christian

Thank you Christian, this will be the next test.

At this moment, without the discard mount option inside UCS, the server survived 25 snapshots (qm snapshot hourly) and working as expected. Yesterday it died in less than 13 hours.
Even I issued several fstrim without problem.

Anyway, based on the proxmox forum it is possible, the problem could caused by pve-qemu-kvm/iothread/virtio(scsi) in some cases.
I will upgrade my system to get a fresh kernel/kvm, probably this night, in the maintenance window.

Just an update: after 56 hours of running, the server still working as expected, I did not experienced any jbd2 error.
Reminder: only thing I did: I removed the discard mount option inside the guest from fstab.

1 Like

Update: I upgraded my Proxmox, according to the proxmox forum.
That means, some qemu changes happened, related this kind of jbd2 lock issue.

Summary:
The test before this upgrade was running for more 58 hours without any issue. Before removing the discard mount option from the guest, during this time period usually the guest died already (I experienced 12-48 hours, usually less than 28 hours). Again, I do qm snapshot hourly, that means, 58 qm snapshot happened before.
If one does only one snapshot per day, during backup, that means almost 2 months running.

I started to test (use) the server and keep watching.
Anything happens, I will report back just for the record.

Upgrade details, focus on: kernel + pve-qemu-kvm
base-files/stable 12.4+deb12u4 amd64 [upgradable from: 12.4+deb12u2]
curl/stable-security 7.88.1-10+deb12u5 amd64 [upgradable from: 7.88.1-10+deb12u4]
distro-info-data/stable 0.58+deb12u1 all [upgradable from: 0.58]
gnutls-bin/stable 3.7.9-2+deb12u1 amd64 [upgradable from: 3.7.9-2]
ifupdown2/stable 3.2.0-1+pmx8 all [upgradable from: 3.2.0-1+pmx7]
libcurl3-gnutls/stable-security 7.88.1-10+deb12u5 amd64 [upgradable from: 7.88.1-10+deb12u4]
libcurl4/stable-security 7.88.1-10+deb12u5 amd64 [upgradable from: 7.88.1-10+deb12u4]
libgnutls-dane0/stable 3.7.9-2+deb12u1 amd64 [upgradable from: 3.7.9-2]
libgnutls30/stable 3.7.9-2+deb12u1 amd64 [upgradable from: 3.7.9-2]
libgnutlsxx30/stable 3.7.9-2+deb12u1 amd64 [upgradable from: 3.7.9-2]
libnss-systemd/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
libnvpair3linux/stable 2.2.2-pve1 amd64 [upgradable from: 2.2.0-pve4]
libpam-systemd/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
libperl5.36/stable 5.36.0-7+deb12u1 amd64 [upgradable from: 5.36.0-7]
libproxmox-rs-perl/stable 0.3.3 amd64 [upgradable from: 0.3.1]
libsystemd-shared/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
libsystemd0/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
libudev1/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
libuutil3linux/stable 2.2.2-pve1 amd64 [upgradable from: 2.2.0-pve4]
libzfs4linux/stable 2.2.2-pve1 amd64 [upgradable from: 2.2.0-pve4]
libzpool5linux/stable 2.2.2-pve1 amd64 [upgradable from: 2.2.0-pve4]
lxcfs/stable 5.0.3-pve4 amd64 [upgradable from: 5.0.3-pve3]
openssh-client/stable-security 1:9.2p1-2+deb12u2 amd64 [upgradable from: 1:9.2p1-2+deb12u1]
openssh-server/stable-security 1:9.2p1-2+deb12u2 amd64 [upgradable from: 1:9.2p1-2+deb12u1]
openssh-sftp-server/stable-security 1:9.2p1-2+deb12u2 amd64 [upgradable from: 1:9.2p1-2+deb12u1]
perl-base/stable 5.36.0-7+deb12u1 amd64 [upgradable from: 5.36.0-7]
perl-modules-5.36/stable 5.36.0-7+deb12u1 all [upgradable from: 5.36.0-7]
perl/stable 5.36.0-7+deb12u1 amd64 [upgradable from: 5.36.0-7]
postfix/stable-updates 3.7.9-0+deb12u1 amd64 [upgradable from: 3.7.6-0+deb12u2]
proxmox-kernel-6.2/stable 6.2.16-20 all [upgradable from: 6.2.16-19]
proxmox-kernel-6.5/stable 6.5.11-7 all [upgradable from: 6.5.11-6]
pve-i18n/stable 3.1.5 all [upgradable from: 3.1.4]
pve-qemu-kvm/stable 8.1.2-6 amd64 [upgradable from: 8.1.2-4]
pve-xtermjs/stable 5.3.0-3 all [upgradable from: 5.3.0-2]
spl/stable 2.2.2-pve1 all [upgradable from: 2.2.0-pve4]
ssh/stable-security 1:9.2p1-2+deb12u2 all [upgradable from: 1:9.2p1-2+deb12u1]
systemd-boot-efi/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
systemd-boot/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
systemd-sysv/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
systemd/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
tzdata/stable 2023c-5+deb12u1 all [upgradable from: 2023c-5]
udev/stable 252.19-1~deb12u1 amd64 [upgradable from: 252.17-1~deb12u1]
zfs-initramfs/stable 2.2.2-pve1 all [upgradable from: 2.2.0-pve4]
zfs-zed/stable 2.2.2-pve1 amd64 [upgradable from: 2.2.0-pve4]
zfsutils-linux/stable 2.2.2-pve1 amd64 [upgradable from: 2.2.0-pve4]

Died again after 15 hours.

messages

Jan 12 13:12:28 ucs-6743 kernel: [55583.858674] jbd2/sda2-8 D 0 319 2 0x80000000
Jan 12 13:12:28 ucs-6743 kernel: [55583.858674] Call Trace:
Jan 12 13:12:28 ucs-6743 kernel: [55583.858676] __schedule+0x29f/0x840
Jan 12 13:12:28 ucs-6743 kernel: [55583.858677] ? blk_mq_sched_insert_requests+0x80/0xa0
Jan 12 13:12:28 ucs-6743 kernel: [55583.858678] ? bit_wait_timeout+0x90/0x90
Jan 12 13:12:28 ucs-6743 kernel: [55583.858678] schedule+0x28/0x80
Jan 12 13:12:28 ucs-6743 kernel: [55583.858679] io_schedule+0x12/0x40
Jan 12 13:12:28 ucs-6743 kernel: [55583.858679] bit_wait_io+0xd/0x50
Jan 12 13:12:28 ucs-6743 kernel: [55583.858680] __wait_on_bit+0x73/0x90
Jan 12 13:12:28 ucs-6743 kernel: [55583.858680] out_of_line_wait_on_bit+0x91/0xb0
Jan 12 13:12:28 ucs-6743 kernel: [55583.858681] ? init_wait_var_entry+0x40/0x40
Jan 12 13:12:28 ucs-6743 kernel: [55583.858682] jbd2_journal_commit_transaction+0xf9c/0x1840 [jbd2]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858685] kjournald2+0xbd/0x270 [jbd2]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858686] ? finish_wait+0x80/0x80
Jan 12 13:12:28 ucs-6743 kernel: [55583.858687] ? commit_timeout+0x10/0x10 [jbd2]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858688] kthread+0x112/0x130
Jan 12 13:12:28 ucs-6743 kernel: [55583.858689] ? kthread_bind+0x30/0x30
Jan 12 13:12:28 ucs-6743 kernel: [55583.858690] ret_from_fork+0x35/0x40
Jan 12 13:12:28 ucs-6743 kernel: [55583.858711] dockerd D 0 1519 1 0x00000000
Jan 12 13:12:28 ucs-6743 kernel: [55583.858712] Call Trace:
Jan 12 13:12:28 ucs-6743 kernel: [55583.858712] __schedule+0x29f/0x840
Jan 12 13:12:28 ucs-6743 kernel: [55583.858713] ? bit_wait_timeout+0x90/0x90
Jan 12 13:12:28 ucs-6743 kernel: [55583.858713] schedule+0x28/0x80
Jan 12 13:12:28 ucs-6743 kernel: [55583.858714] io_schedule+0x12/0x40
Jan 12 13:12:28 ucs-6743 kernel: [55583.858714] bit_wait_io+0xd/0x50
Jan 12 13:12:28 ucs-6743 kernel: [55583.858714] __wait_on_bit+0x73/0x90
Jan 12 13:12:28 ucs-6743 kernel: [55583.858715] out_of_line_wait_on_bit+0x91/0xb0
Jan 12 13:12:28 ucs-6743 kernel: [55583.858716] ? init_wait_var_entry+0x40/0x40
Jan 12 13:12:28 ucs-6743 kernel: [55583.858717] do_get_write_access+0x297/0x410 [jbd2]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858718] jbd2_journal_get_write_access+0x57/0x70 [jbd2]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858721] __ext4_journal_get_write_access+0x36/0x70 [ext4]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858724] __ext4_new_inode+0xa9a/0x1580 [ext4]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858725] ? d_splice_alias+0x153/0x3c0
Jan 12 13:12:28 ucs-6743 kernel: [55583.858729] ext4_create+0xe0/0x1c0 [ext4]
Jan 12 13:12:28 ucs-6743 kernel: [55583.858729] path_openat+0x117e/0x1480
Jan 12 13:12:28 ucs-6743 kernel: [55583.858730] do_filp_open+0x93/0x100
Jan 12 13:12:28 ucs-6743 kernel: [55583.858732] ? __check_object_size+0x162/0x180
Jan 12 13:12:28 ucs-6743 kernel: [55583.858733] do_sys_open+0x186/0x210
Jan 12 13:12:28 ucs-6743 kernel: [55583.858734] do_syscall_64+0x53/0x110
Jan 12 13:12:28 ucs-6743 kernel: [55583.858735] entry_SYSCALL_64_after_hwframe+0x5c/0xc1

Dead processes

319 ? D 0:00 [jbd2/sda2-8]
998 ? Ds 0:00 postgres: 11/main: walwriter
1022 ? Ds 0:02 /usr/sbin/nmbd -D
8881 pts/2 S+ 0:09 watch -n 5 ps ax | grep " D"
9094 pts/3 D+ 0:00 mc
17694 ? D 0:00 /usr/sbin/nmbd -D
17713 ? D 0:00 /usr/sbin/nmbd -D
17731 ? D 0:00 /usr/sbin/nmbd -D
17750 ? D 0:00 /usr/sbin/nmbd -D
17769 ? D 0:00 /usr/sbin/nmbd -D
17787 ? D 0:00 /usr/sbin/nmbd -D
17805 ? D 0:00 /usr/sbin/nmbd -D
17828 ? D 0:00 /usr/sbin/nmbd -D
17846 ? D 0:00 /usr/sbin/nmbd -D
17865 ? D 0:00 /usr/sbin/nmbd -D
17882 ? D 0:00 /usr/bin/python3 /usr/share/univention-monitoring-client/scripts//check_univention_dns
17908 ? D 0:00 /usr/sbin/nmbd -D
17948 ? D 0:00 /usr/sbin/nmbd -D
17966 ? D 0:00 /usr/sbin/nmbd -D
17985 ? D 0:00 /usr/sbin/nmbd -D
18003 ? D 0:00 /usr/sbin/nmbd -D
18021 ? D 0:00 /usr/sbin/nmbd -D
18040 ? D 0:00 /usr/sbin/nmbd -D
18058 ? D 0:00 /usr/sbin/nmbd -D
18076 ? D 0:00 /usr/sbin/nmbd -D
18095 ? D 0:00 /usr/sbin/nmbd -D
18113 ? D 0:00 /usr/sbin/nmbd -D
18131 ? D 0:00 /usr/sbin/nmbd -D
18150 ? D 0:00 /usr/sbin/nmbd -D
18163 ? D 0:00 /usr/sbin/nmbd -D
18180 ? D 0:00 /usr/bin/python3 /usr/sbin/univention-config-registry commit /etc/apt/sources.list.d/20_ucs-online-component.list
18195 ? D 0:00 /usr/sbin/nmbd -D
18217 ? D 0:00 /usr/bin/python3 /usr/share/univention-monitoring-client/scripts//check_univention_dns
18261 ? D 0:00 /usr/sbin/nmbd -D
18301 ? D 0:00 /usr/sbin/nmbd -D
18565 ? D 0:00 /usr/bin/python3 /usr/share/univention-monitoring-client/scripts//check_univention_dns
18935 ? D 0:00 /usr/bin/python3 /usr/share/univention-monitoring-client/scripts//check_univention_dns
19206 pts/2 S+ 0:00 watch -n 5 ps ax | grep " D"
19207 pts/2 S+ 0:00 sh -c ps ax | grep " D"
19209 pts/2 S+ 0:00 grep D
22312 ? Ds 0:01 /usr/sbin/univention-directory-listener -F -d 2 -b dc=intranet,dc=logmaster,dc=hu -m /usr/lib/univention-directory-listener/system -c /var/lib/univention-directory-listener -ZZ -x -D cn=admin,dc=intranet,dc=logmaster,dc=hu -y /etc/ldap.secret

Load:

root@ucs-6743:~# cat /proc/loadavg
38.98 36.66 25.48 1/586 19324

Update: after the server died less than 12 hours, I restarted the test with new changes: I removed the iothread from all disks.
Virtio-scsi-single still presents.
Qemu-guest-agent installed, just to push the limits and increasing the risk to crash.

It is working perfectly for 2 days and 22+ hours now. Promising.

Recent kvm config
qm config 3441
agent: 1
balloon: 4096
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
efidisk0: local-zfs:vm-3441-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide2: none,media=cdrom
memory: 8192
meta: creation-qemu=8.1.2,ctime=1704366686
name: unidc
net0: virtio=xx:xx:xx:xx:xx:xx,bridge=vmbr4000,firewall=1,mtu=1400
numa: 0
ostype: l26
parent: autohourly_2024_01_15T17_05_29
scsi0: local-zfs:vm-3441-disk-1,discard=on,size=128G,ssd=1
scsi1: local-zfs:vm-3441-disk-2,discard=on,size=128G,ssd=1
scsi2: local-zfs:vm-3441-disk-3,discard=on,size=32G,ssd=1
scsi3: local-zfs:vm-3441-disk-4,discard=on,size=32G,ssd=1
scsi4: local-zfs:vm-3441-disk-5,discard=on,size=32G,ssd=1
scsi5: local-zfs:vm-3441-disk-6,discard=on,size=16G,ssd=1
scsi6: local-zfs:vm-3441-disk-7,discard=on,size=256G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=yyyyyyyyyyyyyyyyyyyyyy
sockets: 1
tablet: 0
vmgenid: xxxxxxxxxxxxxxx

Update: uptime is 5 days, 19:03, perfectly working, as expected.

It is safe to say, without iothread it is working well (reference: Recent kvm config shown in the previous post).

Summary:
Using UCS in Proxmox environment, it is necessary to double check the tuning used for kvm VM settings, as iothred + virtio-scsi-single could cause problems (lockups/hangups), which related to qemu internals.

Case closed.

Mastodon