Listener not synchronising on samba4 backend machines

Mbo42 · March 8, 2018, 2:24pm

OK, that didn’t do a lot but I might have some new info.

/etc/init.d/slapd restart just sits on

[....] Starting slapd (via systemctl): slapd.service

for a 5 minutes, i.e. it does not say Found failed.ldif Importing …

ps aux | grep /usr/sbin/univention-directory-replication-resync returns nothing

After 5 minutes it times out

[…] Starting slapd (via systemctl): slapd.serviceJob for slapd.service failed. See ‘systemctl status slapd.service’ and ‘journalctl -xn’ for details.
failed!

This is the output from systemctl status slapd.service

● slapd.service - LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol)
Loaded: loaded (/etc/init.d/slapd)
Active: failed (Result: timeout) since Fri 2018-03-09 00:49:22 AEDT; 2min 9s ago
Process: 15882 ExecStop=/etc/init.d/slapd stop (code=exited, status=0/SUCCESS)
Process: 22855 ExecStart=/etc/init.d/slapd start (code=exited, status=0/SUCCESS)
Main PID: 15460 (code=exited, status=0/SUCCESS)

Mar 09 00:44:22 deimos slapd[22855]: LDAP server already running.
Mar 09 00:44:22 deimos systemd[1]: PID file /var/run/slapd/slapd.pid not readable (yet?) after start.
Mar 09 00:49:22 deimos systemd[1]: slapd.service start operation timed out. Terminating.
Mar 09 00:49:22 deimos systemd[1]: Failed to start LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol).
Mar 09 00:49:22 deimos systemd[1]: Unit slapd.service entered failed state.

and journalctl -xn (Though this may have been too late to catch the relevant parts??)

root@deimos:/home/talcom# journalctl -xn
-- Logs begin at Sun 2018-02-25 14:23:20 AEDT, end at Fri 2018-03-09 01:11:47 AEDT. --
Mar 09 01:10:41 deimos sshd[27528]: Accepted keyboard-interactive/pam for phobos$ from 192.168.20.3 port 32768 ssh2
Mar 09 01:10:41 deimos sshd[27528]: pam_unix(sshd:session): session opened for user phobos$ by (uid=0)
Mar 09 01:10:41 deimos sshd[27535]: Received disconnect from 192.168.20.3: 11: disconnected by user
Mar 09 01:10:41 deimos sshd[27528]: pam_unix(sshd:session): session closed for user phobos$
Mar 09 01:10:48 deimos CRON[27421]: pam_unix(cron:session): session closed for user root
Mar 09 01:11:19 deimos nrpe[27692]: Host 192.168.20.22 is not allowed to talk to us!
Mar 09 01:11:46 deimos nrpe[27767]: Host 192.168.20.22 is not allowed to talk to us!
Mar 09 01:11:47 deimos systemd[1]: slapd.service start operation timed out. Terminating.
Mar 09 01:11:47 deimos systemd[1]: Failed to start LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol).
-- Subject: Unit slapd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit slapd.service has failed.
-- 
-- The result is failed.
Mar 09 01:11:47 deimos systemd[1]: Unit slapd.service entered failed state.

And yes, the nagios plugin gives:

root@deimos:/home/talcom# /usr/lib/nagios/plugins/check_univention_replication
CRITICAL: failed.ldif exists (nid=2121 lid=2080)

I must be missing the point here but if this procedure Recreate listener cache will restore the notify and listen ID’s to the same number why can’t I just delete the failed.ldif? I suppose I’m not sure how tolerant the system is to a “try it and see” approach .

I appreciate your help.