What to do if a failed.ldif is found

Problem:

Each UCS system is running an instance of the Univention Directory Listener.
The Univention Directory Listener is the counterpart to the Univention Directory Notifier which only runs at the DC Master and DC Backups.
The notifier monitors changes in the LDAP directory and makes these changes available for listener instances at each other system - these sent changes are called transactions (see UCS manual: Listener/Notifier replication workflow).
Inside the listener the module replication.py is responsible for adding these LDAP changes to the local LDAP directory of DC Backups and DC Slaves. If this fails for some reason the changes are stored in a file called failed.ldif.

Workaround:

If a failed.ldif is found you can first try to read it by simply restarting the LDAP server:

/etc/init.d/slapd stop
pkill -9 slapd
/etc/init.d/slapd start

or you can use the script itself: (edit 20221027)

/usr/sbin/univention-directory-replication-resync /var/lib/univention-directory-replication/failed.ldif

This should be sufficient in most situations - you can see the running process:

Starting ldap server(s): slapd ...done Found failed.ldif Importing ...

This could take a while, depending to the size of the failed.ldif.
If the process seems to stuck you can monitor if its running with:

ps aux | grep /usr/sbin/univention-directory-replication-resync

If you can see a running process named like this (it should look like /bin/bash /usr/sbin/univention-directory-replication-resync /var/lib/univention-directory-replication/failed.ldif) the reimport is still running. If this takes a very long time then it perhaps got stuck in a deadlocking situation. In most cases it is sufficient to kill this process and to restart the listener:

pkill univention-directory-replication-resync; /etc/init.d/univention-directory-listener restart

If the failed.ldif could not be imported, you can compare the error-messages in the logfiles, with the transactions in the failed.ldif

The failed.ldif is read top-down. In most cases the first transaction in this file is responsible for stopping the replication.
A hint why it stops can be found in /var/log/univention/listener.log and in /var/log/univention/ldap-replication-resync.log.
For example, the object could not removed, because it is already deleted (you should check this with univention-ldapsearch). In that case you can delete this entry in the failed.ldif-file (/var/lib/univention-directory-replication/failed.ldif). Then restart the slapd to reread the failed.ldif-file

/etc/init.d/slapd restart

Sometimes changes cannot be added to the LDAP on the backup- or slave-server, because the ldap-schema does not match. This could happen, if an app brings additional ldap-schema-extensions. If these are not installed correctly, changes could not be written.You will find an error-message like

“objectClass: value #1 invalid per syntax”

in the listener.log. In this case first of all you can compare the schema ID from the master and the replicating server. The ID is located in

/var/lib/univention-ldap/schema/id/id

The ID’s have to be equal on all systems.
Additional information to the message in the logfile is in the failed.ldif-file. There you find the attribute which should be modified. On the system with the failed.ldif you might search for the attribute in the ldapschema.

grep Attribut /var/lib/univention-ldap/schema.conf

If this specific attribute is not found, the ldap-schema is not completely replicated on the system.
To replicate the schema, decrement the schema ID on the system. Therefore stop the univention-directory-listener, decrement the schemaI D by one, and start the univention-directory-listener.

service univention-directory-listener stop
vim /var/lib/univention-ldap/schema/id/id --> decrement by one, if the IDs were equal
service univention-directory-listener start

If this does not suceed you can try to remove the failed.ldif and do a full re-join. If the system would be rejoined without removing the failed.ldif before, this will slow down the join process as data gets replicated twice perhaps:

rm /var/lib/univention-directory-replication/failed.ldif univention-join

If this fails while univention directory listener join script there possibly is a persistent issue withing LDAP directory data. Please have a look to the two coresponding log files in this case:

/var/log/univention/join.log
/var/log/univention/listener.log

You can have a look in this article, if you have trouble with slapd

1 Like
Mastodon