Problem
The diagnostic module in Univention Management Console (UMC) reports a warning about problems with UDN replication similar to:
Error retrieving notifier ID from the UDN.
or
Univention Directory Notifier ID and the locally stored version differ.
This might indicate an error or still processing transactions.
Solution
Step 1
Please log on to the console as root
e.g. via ssh and use the command /usr/lib/nagios/plugins/check_univention_replication
to check which replication state the system is in:
/usr/lib/nagios/plugins/check_univention_replication
This command may output an error messsage like the following:
CRITICAL: no change of listener transaction id for last 0 checks (nid=3030 lid=3018)
This output says that the replication is 12 transactions behind. If all services are running correctly, the transactions can be processed after a few seconds:
root@shell:~# /usr/lib/nagios/plugins/check_univention_replication
OK: replication complete (nid=3030 lid=3030)
In this case, the replication is up to date.
If the replication got stuck, the following services should be restarted on the involved systems:
service univention-directory-notifier restart
service univention-directory-listener restart
If check_univention_replication
returns the message CRITICAL: failed.ldif exists
, please follow
Step 2
Check for notifer service on UCS Master and Backup systems
If check_univention_replication
returns the message CRITICAL: no change of listener transaction
on the UCS Master, together with the error message nid=Error: [Errno 111] Connection refused
, then you may use the following command to check if the notifier service is running at all:
pgrep -f /usr/sbin/univention-directory-notifier
This command should return the process ID number. If it doesn’t, then you can check the status of the service by running the command
sv status univention-directory-notifier | sed -n 's/:.*//p'
Normally this should output run
. If the service has been stopped normally (for example temporarily during and update), the status will be down
. If the service was terminated for unknown reasons (e.g. due to a programm crash), the status will be finished
. If the service is not running, you can try to start it again by running the command
service univention-directory-notifer start
After this step, please continue the analysis by returning to Step 1. In case the service doesn’t resume normal operation, you may check the log file /var/log/univention/notifier.log
for recent ERROR messages before continuing with Step 3. In case you finally need to open a support ticket, the error messages may be helpful for analysis.
You should also check the state of the notifier service on UCS Backup servers. The notifier service should be running on each UCS Backup server to support failover for UCS Slave and Memberserver systems.
Check for local listener on all UCS systems
If repeated runs of check_univention_replication
continue returning the message CRITICAL: no change of listener transaction id
and the lid (listener ID) doesn’t change during calls, then you should check if the listener service is running. This should be checked on every UCS server. It’s a Nagios check, so if you are running Nagios, you may use that to obtain a quick overview. If the service is not running on any system, you should connect to the system via SSH as root and use the following command to check if the listener service is running at all:
pgrep -f /usr/sbin/univention-directory-listener
This command should return the process ID number. If it doesn’t, then you can check the status of the service by running the command
sv status univention-directory-listener | sed -n 's/:.*//p'
Normally this should output run
. If the service has been stopped normally (for example temporarily during and update), the status will be down
. If the service was terminated for unknown reasons (e.g. due to a programm crash), the status will be finished
. If the service is not running, you can try to start it again by running the command
service univention-directory-listener start
After this step, please continue the analysis by returning to Step 1. In case the service doesn’t resume normal operation, you may check the log file /var/log/univention/listener.log
for recent ERROR messages. In case you finally need to open a support ticket, the error messages may be helpful for analysis.
Step 3
Check cn=tanslog database size on DC Master
If the notifier and listener services are running, you can continue by checking if the storage capacity of the cn=translog
backend database is exhausted. This check should be done on the UCS Master as well as on UCS Backup servers. This situation can only arise on servers with an amd64
processor architecture. On i386
servers, the cn=translog
backend database is stored in a Berkeley Database instead, which is only limited by the size of the underlying partition.
On amd64
servers you may use the following commands run as root to calculate the percentage of use of the translog database:
used_pages=$(mdb_stat -e /var/lib/univention-ldap/translog | sed -n 's/^ *Number of pages used: //p')
max_pages=$(mdb_stat -e /var/lib/univention-ldap/translog | sed -n 's/^ *Max pages: //p')
python -c "print('%.1f used' % (float($used_pages) / $max_pages * 100))"
That last command should output a value between 0 and 100. If this value is close to 100, the database usage is close to its limit and you should consider raising the limit. But first, you should check, if you actually have hit the limit already, by running the command
grep 'MDB_MAP_FULL: Environment mapsize limit reached' /var/log/syslog
By default this is set to 2GiB (2147483648 bytes). You may set the UCR variable ldap/database/mdb/maxsize
to a higher value. Please note that the value must be given in bytes. To activate the new database size, a simple restart of the LDAP-server is sufficient. This may be done by running service slapd restart
.
This should be checked on each UCS server of role Master or Backup.
Please continue with Step 4.
Step 4
The following script can be used to check if the notify/transaction
and the listener/listener
or listener/listener.priv
files suffer from corruption, e.g. due to a overflow of the harddisk which may have occurred in the past.
Since UCS 4.3 erratum 470 and UCS 4.4 erratum 33 the command /usr/share/univention-directory-notifier/univention-translog check
can also be used for checking and te command /usr/share/univention-directory-notifier/univention-translog check --fix
for correcting this issue.
If no errors are reported, please continue with Step 5. If not, please continue with Step 8.
Alternatively, the following script can be used:
python -c "
#!/usr/bin/env python
import ldap
import os
for transactionfile in ('notify/transaction', 'listener/listener', 'listener/listener.priv'):
filepath = '/var/lib/univention-ldap/%s' % transactionfile
if not os.path.exists(filepath):
continue
print('Checking %s' % filepath)
with open(filepath, 'r') as f:
lc = 0
for line in f:
lc += 1
head_tail = line.strip().split(' ', 1)
if len(head_tail) != 2:
print('ERROR missing second column at line %d: "%s"' % (lc, line))
break
(id, tail) = head_tail
try:
cur_lc = int(id)
except ValueError:
print 'ERROR at line %d: "%s"' % (lc, line)
break
head_tail = tail.rsplit(' ', 1)
if len(head_tail) != 2:
print 'ERROR missing third column at line %d: "%s"' % (lc, line)
break
(dn, opcode) = head_tail
if not ldap.dn.is_dn(dn):
print 'ERROR not a valid DN at line %d: "%s"' % (lc, line)
break
else:
print('OK')
continue
break
"
When copying this script into a terminal or file please make sure to keep the indentation, as the Python programming language depends on this.
If no errors are reported, please continue with Step 5. If not, please continue with Step 8.
Step 5
Next, the transaction file should be checked for contiguous numbering.
Since UCS 4.3 erratum 470 and UCS 4.4 erratum 33 the command /usr/share/univention-directory-notifier/univention-translog check
can also be used for checking and te command /usr/share/univention-directory-notifier/univention-translog check --fix
for correcting this issue.
Alternatively, this can be accomplished by running the following Python script:
python -c "
#!/usr/bin/env python
with open('/var/lib/univention-ldap/notify/transaction', 'r') as transaction:
lc = 0
for line in transaction:
lc += 1
(id, tail) = line.strip().split(' ', 1)
try:
cur_id = int(id)
except ValueError:
print 'ERROR at line %d, does not start with an integer number: "%s"' % (lc, line)
break
if lc == 1:
start_id = cur_id
if cur_id != (lc - 1 + start_id):
print 'ERROR at line %d, transaction IDs not contiguous: "%s"' % (lc, line)
break
else:
print('OK')
"
When copying this script into a terminal or file please make sure to keep the indentation, as the Python programming language depends on this.
If this script reports any not contiguous
messages, you may start an editor to attempt to fix fill the gaps in numbering. Please make a backup copy of the translog
file first. The error message should indicate the line of non-contiguous numbering. You may fill the gap by inserting lines that are numbered contiguously and have the following format:
number $ldap_base m
The trailing letter m
is a literal “m” and represents a dummy modification. Please replace $ldap_base
by the LDAP base of your UCS domain (command: ucr get ldap/base
). After changing the file, please run the check script again and adjust as necessary.
If the script above outputs the error doesn't start with an integer number
, then you should continue with Step 8 below, otherwise continue with Step 6.
Step 6
Check for transaction file duplicates
Next, the transaction file should be checked for duplicates of transation IDs.
Since UCS 4.3 erratum 470 and UCS 4.4 erratum 33 the command /usr/share/univention-directory-notifier/univention-translog check
can also be used for checking and te command /usr/share/univention-directory-notifier/univention-translog check --fix
for correcting this issue.
Alternatively, this can be accomplished by running the following Python script:
python -c "
#!/usr/bin/env python
with open('/var/lib/univention-ldap/notify/transaction', 'r') as transaction:
lc = 0
last_id = -1
for line in transaction:
lc += 1
(id, tail) = line.strip().split(' ', 1)
try:
cur_id = int(id)
except ValueError:
print 'ERROR at line %d: "%s"' % (lc, line)
break
if last_id >= cur_id:
print 'ERROR: duplicate at line %d: id %d reused' % (lc, cur_id)
break
last_id = cur_id
else:
print('OK')
"
When copying this script into a terminal or file please make sure to keep the indentation, as the Python programming language depends on this.
If this script returns an error message, you may run the command
cd /var/lib/univention-ldap/notify
mv transaction{,_backup}
mv transaction{,_backup}.index
sort -u transaction_backup > transaction
After that, please run the script above again. In case it doesn’t return an error, then the duplicate entries have been trivial and are resolved.
If you continue to find errors in the transacton
file, please continue with Step 8.
Step 7
Check for correct last_id
:
The value in /var/lib/univention-ldap/last_id
should be checked:
- If the file
/var/lib/univention-ldap/listener/listener
contains any lines, then the value in last_id should match the value in the last line of/var/lib/univention-ldap/listener/listener
. - If the file
/var/lib/univention-ldap/listener/listener
is empty, and/var/lib/univention-ldap/listener/listener.priv
exists, than the value in last_id should match the value in the last line of/var/lib/univention-ldap/listener/listener.priv
. - Otherwise last_id should simply match the first value in the last line of the file
/var/lib/univention-ldap/notify/transaction
:
echo "last_id: $(cat /var/lib/univention-ldap/last_id)"
for name in listener/listener listener/listener.priv notify/transaction
do
path="/var/lib/univention-ldap/$name"
[ -s "$path" ] || continue
echo -n "$name ends with TID: "
tail -n 1 "$path" | awk '{print $1}'
break
done
Since UCS 4.3 erratum 470 and UCS 4.4 erratum 33, the command /usr/share/univention-directory-notifier/univention-translog check
can also be used for checking and the command /usr/share/univention-directory-notifier/univention-translog check --fix
for correcting this issue.
Step 8
In case severe inconsistency or corruption has be detected for the translog
file we recommend to reset the replication for all systems in the domain. Please note that this is a major operation and will induce temporary downtime for all services in the UCS domain. The recommended steps are described in the following SDB article: