Problem: LDAP Service Fails to Start on Nodes - UVMM-related ACL and schema

Problem

The Primary Node was upgraded to UCS 5.2-5 successfully.
While the upgrade on the Primary Node completed without issues, the slapd.service failed to start on the Backup Node and several UCS@school Replica Nodes. The reason was that UVMM-related ACL and schema references were still present in the generated slapd.conf. This resulted in the following LDAP startup error:

unknown attr "@univentionVirtualMachine"

As a consequence, numerous failed.ldif entries were also generated on the Backup Node and the (UCS@school) Replica Nodes.

The affected Backup and Replica Nodes were powered off during the upgrade process. After they were started again, the slapd.service no longer started on the UCS 5.0-10 systems.

On the DC Backup Node, the UVMM schema was still installed. UCS@school deploys an LDAP ACL that optionally references the UVMM schema. Because the corresponding replication transactions were missed while the systems were offline, outdated UVMM-related ACL templates remained on the affected systems.

Several failed.ldif entries were present on the Backup Node. All entries had been created while the affected systems were offline.

Example log output:

Jun 06 13:31:05 replica-node slapd[3367]: connections_destroy: nothing to destroy.
Jun 06 13:31:05 replica-node slapd[3355]: Starting ldap server(s): slapd ...failed.
Jun 06 13:31:05 replica-node slapschema[3370]: Loaded metadata from "/usr/share/univention-management-console/saml/idp/ucs-sso.example.net.xml"
Jun 06 13:31:05 replica-node slapschema[3370]: No trusted audiences configured
Jun 06 13:31:05 replica-node slapschema[3370]: oauthbearer_client_plug_init() failed in sasl_server_add_plugin(): error when parsing configuration file
Jun 06 13:31:05 replica-node slapschema[3370]: _sasl_plugin_load failed on sasl_server_plug_init for plugin: oauthbearer
Jun 06 13:31:05 replica-node slapd[3355]: /etc/ldap/slapd.conf: line 335: unknown attr "@univentionVirtualMachine" in to clause <access clause>
Jun 06 13:31:05 replica-node systemd[1]: slapd.service: Control process exited, code=exited, status=1/FAILURE
Jun 06 13:31:05 replica-node systemd[1]: slapd.service: Failed with result 'exit-code'.
Jun 06 13:31:05 replica-node systemd[1]: Failed to start LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol).

The affected systems also contained a large number of replication failures:

wc -l /var/lib/univention-directory-replication/failed.ldif
1187 /var/lib/univention-directory-replication/failed.ldif

Root Cause

Old UVMM packages and the LDAP ACL templates installed by these packages remained on the affected nodes.

When UVMM packages and templates are removed from the Primary Node, the corresponding changes are distributed through the Listener/Notifier replication mechanism. Nodes that are powered off during this time do not receive these replication transactions.
These missing transactions can lead to replication inconsistencies and gaps that become visible in the output of:

univention-directory-listener-ctrl status

Example:

Current Notifier ID on "primary.example.net"
 3873646

Last Notifier ID processed by local Listener:
 3805937

As a result, outdated UVMM ACL templates remain on the affected nodes. During LDAP startup, these ACLs reference the removed UVMM schema attributes and cause slapd.service to fail with the error:

unknown attr "@univentionVirtualMachine"

This issue is tracked in the following bug report:

Bug Report: # 59502


Investigation

Check slapd.conf for UVMM References

grep -i uvmm /etc/ldap/slapd.conf

Check the Replication Status

univention-directory-listener-ctrl status

or

/usr/lib/nagios/plugins/check_univention_replication

Verify LDAP Database Consistency

Run:

slapschema

If no errors are reported, restart the services:

systemctl restart slapd.service
systemctl restart univention-directory-notifier.service
systemctl restart univention-directory-listener.service

Check for Existing failed.ldif Files

Inspect existing replication failures. Depending on the contents, they may block the startup of slapd.service.

wc -l /var/lib/univention-directory-replication/failed.ldif
less /var/lib/univention-directory-replication/failed.ldif

Check UVMM Package Status

dpkg -l | grep -i uvmm

Search for Remaining UVMM Templates

grep -rl "uvmm" /etc/univention/templates/info/ /etc/univention/templates/files/

Check for Existing or Hanging slapd Processes

ps aux | grep slapd

Terminate any stale processes if necessary:

kill <PID>

Solution

Remove all remaining UVMM packages from the affected nodes.

Important: The LDAP ACL template files are not automatically removed when the package is uninstalled. These files must be removed manually.

Remove the remaining UVMM packages:

univention-remove univention-management-console-module-uvmm*

Remove the obsolete template files:

rm /etc/univention/templates/files/etc/ldap/slapd.conf.d/66univention-ldap-server_acl-master-uvmm.acl
rm /etc/univention/templates/info/ldapacl_66univention-ldap-server_acl-master-uvmm.acl.info

Regenerate the LDAP configuration:

ucr commit /etc/ldap/slapd.conf

Restart the LDAP service:

systemctl restart slapd.service

Verify the service status:

systemctl status slapd.service

If the restart of slapd.service still fails, an existing failed.ldif may be blocking the startup process. In this case, inspect and remove the problematic entries as described in the following article:

Final Verification

After LDAP has been restored successfully, verify the replication status again:

univention-directory-listener-ctrl status

or

/ usr/lib/nagios/plugins/check_univention_replication

If replication is still not synchronized, restarting the Listener service may help:

systemctl restart univention-directory-listener.service

The replication status should eventually show that the local Listener has processed the current Notifier ID and that all nodes are synchronized again.