Uvmm libvirt too many open files

virtualization
german

#1

Hallo,

beim Kunden funktioniert der UVMM auf keinen Server mehr.

uvmm log:

2013-10-14 08:45:53,884 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:45:53,925 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:47:39,810 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:50:45,312 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:50:53,885 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:50:53,925 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:52:39,811 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:55:45,312 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:55:53,885 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:55:53,926 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 08:57:39,811 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:00:45,313 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:00:53,886 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:00:53,926 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:02:39,812 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:05:45,313 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:05:53,886 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:05:53,927 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:07:39,812 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:10:45,314 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:10:53,887 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:10:53,927 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:12:39,813 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:15:45,314 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:15:53,887 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:15:53,927 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:17:39,813 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:20:45,315 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:20:53,888 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:20:53,928 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:22:39,814 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:25:45,315 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:25:53,888 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:25:53,929 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:27:39,814 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:30:45,316 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:30:53,889 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:30:53,929 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:32:39,815 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:35:45,316 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:35:53,889 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:35:53,929 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:37:39,815 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:40:45,317 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:40:53,890 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:40:53,930 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:42:39,816 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:45:45,317 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:45:53,890 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:45:53,930 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:47:39,816 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:50:45,318 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:50:53,891 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:50:53,931 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:52:39,817 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:55:45,318 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:55:53,891 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:55:53,931 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 09:57:39,817 - uvmmd.node - WARNING - 'qemu://s-wucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 10:00:45,319 - uvmmd.node - WARNING - 'qemu://s-vucs02.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 10:00:53,892 - uvmmd.node - WARNING - 'qemu://s-mucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files 2013-10-14 10:00:53,931 - uvmmd.node - WARNING - 'qemu://s-vucs01.domain.local/system' broken? next check in 0:05:00.000. Failed to open file '/etc/libvirt/libvirt.conf': Too many open files

root@s-vucs01:~# cat /etc/issue Univention DC Master 3.1-1

Was kann ich tun?


#2

Hallo,

Du hattest vor einem knappen Jahr einen Thread zum Thema Samba4 max open files. Das sollte adäquat auch hier anwendbar sein.
Bevor man allerdings irgendwelche Werte hochsetzt, müsste man genauer analysieren, welcher konkrete Prozess hier zu viele Dateien offen hat. Für mich zumindest geht das aus dem Log nicht hervor.

In [bug]31370[/bug] gibt es ggf. einen hilfreichen Verweis zu abgelaufenen SSL-Zertifikaten.

Viele Grüße,
Dirk Ahrnke


#3

Hallo,

SSL Zertifikat ist nicht abgelaufen.

lsof /etc/libvirt/libvirt.conf

liefert gar nichts zurück…


#4

[quote=“DBGTMaster”]

lsof /etc/libvirt/libvirt.conf

liefert gar nichts zurück…[/quote]
ich dachte eher an etwas in dieser Richtung:

lsof -p $(</etc/runit/univention-virtual-machine-manager-daemon/supervise/pid)

Siehe initiale Beschreibung von [bug]31370[/bug].


#5

Viele TCP Verbindungen…

... ...univentio 19721 root 307u IPv4 3123838212 0t0 TCP s-vucs01.domain.local:40089->s-mucs01.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 308r FIFO 0,8 0t0 3123838214 pipe univentio 19721 root 309w FIFO 0,8 0t0 3123838214 pipe univentio 19721 root 310u IPv4 3123937228 0t0 TCP s-vucs01.domain.local:52602->s-vucs02.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 311r FIFO 0,8 0t0 3123937229 pipe univentio 19721 root 312w FIFO 0,8 0t0 3123937229 pipe univentio 19721 root 313u sock 0,6 0t0 3123937242 can't identify protocol univentio 19721 root 314r FIFO 0,8 0t0 3123937243 pipe univentio 19721 root 315w FIFO 0,8 0t0 3123937243 pipe univentio 19721 root 316u IPv4 3124035154 0t0 TCP s-vucs01.domain.local:40199->s-mucs01.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 317r FIFO 0,8 0t0 3124035155 pipe univentio 19721 root 318w FIFO 0,8 0t0 3124035155 pipe univentio 19721 root 319u IPv4 3124035540 0t0 TCP s-vucs01.domain.local:52648->s-vucs02.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 320r FIFO 0,8 0t0 3124035541 pipe univentio 19721 root 321w FIFO 0,8 0t0 3124035541 pipe univentio 19721 root 322u IPv4 3124133450 0t0 TCP s-vucs01.domain.local:52695->s-vucs02.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 323r FIFO 0,8 0t0 3124133451 pipe univentio 19721 root 324w FIFO 0,8 0t0 3124133451 pipe univentio 19721 root 325u IPv4 3124133871 0t0 TCP s-vucs01.domain.local:40260->s-mucs01.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 326r FIFO 0,8 0t0 3124133873 pipe univentio 19721 root 327w FIFO 0,8 0t0 3124133873 pipe univentio 19721 root 328u IPv4 3124231911 0t0 TCP s-vucs01.domain.local:40326->s-mucs01.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 329r FIFO 0,8 0t0 3124231913 pipe univentio 19721 root 330w FIFO 0,8 0t0 3124231913 pipe univentio 19721 root 331u IPv4 3124231948 0t0 TCP s-vucs01.domain.local:52771->s-vucs02.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 332r FIFO 0,8 0t0 3124231949 pipe univentio 19721 root 333w FIFO 0,8 0t0 3124231949 pipe univentio 19721 root 334u IPv4 3124329338 0t0 TCP s-vucs01.domain.local:52811->s-vucs02.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 335r FIFO 0,8 0t0 3124329339 pipe univentio 19721 root 336w FIFO 0,8 0t0 3124329339 pipe univentio 19721 root 337u IPv4 3124329409 0t0 TCP s-vucs01.domain.local:40368->s-mucs01.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 338r FIFO 0,8 0t0 3124329412 pipe univentio 19721 root 339w FIFO 0,8 0t0 3124329412 pipe univentio 19721 root 340u IPv4 3124428454 0t0 TCP s-vucs01.domain.local:40496->s-mucs01.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 341r FIFO 0,8 0t0 3124428455 pipe univentio 19721 root 342w FIFO 0,8 0t0 3124428455 pipe univentio 19721 root 343u IPv4 3124428555 0t0 TCP s-vucs01.domain.local:52942->s-vucs02.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 344r FIFO 0,8 0t0 3124428556 pipe univentio 19721 root 345w FIFO 0,8 0t0 3124428556 pipe univentio 19721 root 346u IPv4 3124524390 0t0 TCP s-vucs01.domain.local:40523->s-mucs01.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 347r FIFO 0,8 0t0 3124524391 pipe univentio 19721 root 348w FIFO 0,8 0t0 3124524391 pipe univentio 19721 root 349u IPv4 3124524580 0t0 TCP s-vucs01.domain.local:52968->s-vucs02.domain.local:16514 (CLOSE_WAIT) univentio 19721 root 350r FIFO 0,8 0t0 3124524581 pipe univentio 19721 root 351w FIFO 0,8 0t0 3124524581 pipe ... ...

Aber warum sind so viele Verbindungen offen? Ist das normal?
Ich werd heute Abend, so im Bug Eintrag beschrieben, die max_clients in der libvirt.conf erhöhen, und eventuell auch max open files.

lG


#6

Wunderbar, ich habe Erfolge erzielt:

Habe auf allen Virtualisierungsservern libvirt + uvvm-daemon gestoppt. dann noch mit “kill -9” den zombie uvmm-daemon beendet.
libvirt.conf habe ich gestzt:

max_clients = 500

Dienste auf allen Servern wieder gestartet, und hurra, ich kann wieder auf alle Server zugreifen :slight_smile:

Danke!


#7

Du solltest das eventuell noch im Auge behalten.
In dem Schnipsel der Logausgabe sind jeweils 7 TCP.Verbindungen zu den beiden Servern auf CLOSE_WAIT gewesen.
Diese Zeile aus dem erwähnten Bug zeigt die Anzahl:

lsof -p $(</etc/runit/univention-virtual-machine-manager-daemon/supervise/pid)|awk '/TCP.*->/{split($9,a,/->/);print a[2]}'| sort | uniq -c

Da die Ursache weiterhin unklar ist, könnte es sein, dass das Problem wieder auftritt, wengleich jetzt auch etwas verzögert.

Viele Grüße,
Dirk Ahrnke


#8

Ursache könnte eventuell sein, dass vor paar Wochen 2 testweise 2 weitere KVM Server betrieben wurde, diese wieder entfernt wurden, und dadurch irgendwie zu diesem Zustand gekommen ist.
die 2 KVM Server wurden natürlich wieder vollständig aus dem LDAP & co entfernt.

root@s-vucs01:~# lsof -p $(</etc/runit/univention-virtual-machine-manager-daemon/supervise/pid)|awk '/TCP.*->/{split($9,a,/->/);print a[2]}'| sort | uniq -c 1 s-mucs01.domain.local:16514 1 s-vucs01.domain.local:16514 2 s-vucs02.domain.local:16514 1 s-wucs01.domain.local:16514

1x CLOSE_WAIT ist dabei, ich werde es beobachten.


#9

Problem tritt schon wieder auf…
Es scheint so zu sein, dass das Problem immer auftritt, wenn ich 1-2 KVM Server neu starte (insgesamt sind es 4 KVM UVMM Server)…

Was kann man da tun??


#10

Hallo,

wenn der Fehler wieder auftritt, solltest auf jeden Fall mal mit dieser lsof…-Zeile den Status prüfen und dokumentieren.

Ich glaube, es wäre auch relevant zu wissen, welche UCS-Versionen und welche Systemrollen auf den beteiligten Systemen aktiv sind. Im ersten Post hast Du zunächst nur den DC Master aufgeführt.

Viele Grüße,
Dirk Ahrnke