Check_univention_replication fails after some time

On a regular base Icinga2 reports when running check_univention_replication on our Backup Domain Controller:
CRITICAL: no change of listener transaction id for last 10 checks (nid=6579 lid=cat: /var/lib/univention-directory-listener/notifier_id: Permission denied)

root@bdc:~# ls -l /var/lib/univention-directory-listener/notifier_id
-rw------- 1 listener nogroup 4 Jul 19 12:06 /var/lib/univention-directory-listener/notifier_id

When I restart the service with service univention-directory-listener restart the file is create with correct file permissions.

root@bdc:~# ls -l /var/lib/univention-directory-listener/notifier_id
-rw-r--r-- 1 listener nogroup 4 Jul 19 12:26 /var/lib/univention-directory-listener/notifier_id

But after couple of hours or days the problem returns.

Any ideas?

We had that before.
The solution in https://forge.univention.org/bugzilla/show_bug.cgi?id=31573 mentions the following:

With Bug #41261 fixed the file /var/lib/univention-directory-listener/notifier_id is now created newly and renamed after each transaction; thus the file now has the correct permissions since UCS-4.1-2 errata

You didnt mention the version you are using. This should be provided before it makes sense to dig in deeper.

The version is tagged on the thread.
The server was installed with 4.2 and upgraded to 4.3-1.

Yes, I found those bug report. But this behaves differently. The file is create correctly. And I can see that the mentioned bugfix is in place since the inode number changes each time the file is written. That’s what written in the fix description. The file is moved and recreated each time.

What happens in our case is, that after some time, over night or so, the permissions on this file changes to 600, as the ls before shows.

Are there any housekeeping jobs that UCS performs?
Cleaning up logfiles, checking permissions, …

I must admit that I didnt look at the tags. And looking at this I would say that it is less helpful as it doesnt even provide the number of the point release, not to mention the errata level.

In general it would be helpful if the steps “what have I already done to narrow down the issue” are mentioned in the first description. This helps to speed up the discussion.

There are housekeeping jobs triggered by cron but it is possible that this behavior is caused by the listener itself.

I dont know Icinga but I would expect that you should be able to figure out from their event log when the permission changes more or less exactly. With this information you could look at the files in /var/log/univention/ or the cronjobs (see /var/log/syslog) what might have caused the change.

hth,
Dirk

Hi forum,

same issue here w/ up-to-date UCS 4.3-3 errata430. The system was originally installed with version 3.2-x in autumn 2014 and then successfully updated until now. It is only now that the internal Nagios is noticed.

I restartet service w/ systemctl restart univention-directory-listener.service
Let’s see how long permissions remain correct.

Is there already a permanent fix or does hardly anyone use the Nagios app?

TIA, Robert

Hi again,

and exactly after 30d the problem occurs again, so some periodic 30d event seems to cause the problem, possibly housekeeping jobs or the like:

root@XXXXX ~ # la /var/lib/univention-directory-listener/notifier_id
-rw------- 1 listener nogroup 4 Mär 22 04:40 /var/lib/univention-directory-listener/notifier_id

And after manually systemctl restart univention-directory-listener.service

we get correct permissions and monitoring service RECOVERY:

root@XXXXX ~ # la /var/lib/univention-directory-listener/notifier_id
-rw-r--r-- 1 listener nogroup 4 Mär 22 10:04 /var/lib/univention-directory-listener/notifier_id

PS: And there are no errors at /var/log/univention/listener.log

Regards Robert

Can you please run the following command if the error occurs:

grep Umask /proc/$(pgrep -f /usr/sbin/univention-directory-listener)/status

It will check the umask of the currently running UDL process. It should be 0022.
If it is not: some UDL module changed the umask to some other value and did not restore it back. We just experienced the same problem yesterday here internally at Univention with the Bareos¹ listener module. You can use the following command to have a look it such an obvious faulty UDL module exists:

grep umask /usr/lib/univention-directory-listener/system/*

By restarting UDL the problem is fixed until the broken UDL modules runs again.

¹: `/usr/lib/univention-directory-listener/system/univention-bareos.py

1 Like

Thanks for your reply & hints @pmhahn,

root@XXXXX ~ # grep umask /usr/lib/univention-directory-listener/system/* 
/usr/lib/univention-directory-listener/system/univention-bareos.py:   os.umask(077)
/usr/lib/univention-directory-listener/system/univention-bareos.py:   os.umask(077)

Bareos listener module also seems to be the cause here. Should I also mention this in the bug report or forward it to Bareos or have you thankfully already done so?

Regards Robert

We have already contacted Bareos.

You can give the following patch a try at your own risk:

--- /usr/lib/univention-directory-listener/system/univention-bareos.py.orig   2019-05-27 15:51:14.220113906 +0200
+++ /usr/lib/univention-directory-listener/system/univention-bareos.py        2019-05-27 15:54:05.321329545 +0200
@@ -142,10 +142,10 @@
 
     char_set = string.ascii_uppercase + string.digits + string.ascii_lowercase
     password=''.join(random.sample(char_set*40,40))
-    os.umask(077)
     with open(path,'w') as f:
+        os.fchmod(f.fileno(), 0o600)
+        os.fchown(f.fileno(), -1, 0)
         f.write(password)
-    os.chown(path,-1,0)
 
     return password
 
@@ -159,15 +159,14 @@
         password=getClientSecret(client_name)
         path=JOBS_PATH+'/'+client_name+'.include'
         templatefile=JOBS_PATH+'/'+client_type+'.template'
-        os.umask(077)
         with open(templatefile,'r') as f:
                 content=f.read()
 
         t=string.Template(content)
         with open(path,"w") as f:
+                os.fchmod(f.fileno(), 0o640)
+                os.fchown(f.fileno(), -1, bareos_gid)
                 f.write(t.substitute(enable=enable, password=password, client_name=client_name))
-        os.chown(path,-1,bareos_gid)
-        os.chmod(path,stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP)
 
 def disableClientJob(client_name,client_type):
         createClientJob(client_name,client_type,'No')

(save it to a file “PATCH” and apply it with sudo patch /usr/lib/univention-directory-listener/system/univention-bareos.py PATCH; after that restart UDL with sudo systemctl restart univention-directory-listener)

2 Likes

Thank you for already providing a patch and contacting us.
We also created a patch and committed it as https://github.com/bareos/bareos/commit/da72d6eda8e82eccd676744fb5f2781b80e269cb

We will update the Bareos APP soon.

@pmhahn, thanks a lot for posting the patch. I applied it and will wait if it works.

@joergs, looking forward to the official patch :slight_smile: