Member Server UDN replication Warning

We had a power cut at the office and while the servers all appeared to shut down gracefully while on battery backup, upon rebooting them there was issues with the Primary DC. The secondary appeared fine, but on the primary, when the univention-directory-notifier service started, slapd would fail.

After several hours of troubleshooting and going through every article I could find with no results, I went for the nuclear option and restored both DCs from previous nights backup (they are VMs).

The DCs are now fine, and everything appears to work, however the member servers (both Univention Samba File Servers) are showing a UDN replication Warning in the diagnostics and running

/usr/lib/nagios/plugins/check_univention_replication

gives the following:

CRITICAL: no change of listener transaction id for last 0 checks (nid=3975 lid=4061)

I expected this, but my question is, will the member servers sort themselves out once the DC’s transaction ID catches up?

The notifier ID (nid) is the transaction number on the Primary, e.g. the last transaction that has happened in the domain as all transactions start on the Primary.

The listener ID (lid) is the transaction last processed on your local system: Each system might be different but should converge to nid after some time if no new transactions happen.

lid < nid is normal and translates to: there are pending transactions, which already have happened on the Primary but have not yet been replicated to the local system → give the system more time to process them and to finally catch up.

lid == nid should be reached after some time: your local host is then in-sync with the Primary.

But you have lid >> nid, which normally should not happen!
As you have restored your Primary from backup you have lost all transactions, which happened between the time you took the backup up until your power failure: All other systems in your domain have seen those transactions and have replicated them. Because of that their lid is in an alternate future: They know of transaction, which you restored Primary no longer has.
You have to re-join all your secondary systems, e.g. run univention-join again on all systems except your Primary: They will just forget all transactions which have happened until than. Instead they will start from scratch by re-fetching the current state of LDAP from your Primary and continue on from there.

Thanks, worked like a charm.
I thought about a Domain Rejoin but didn’t want to increase my issues!

Mastodon