HowTo: fix Re-Join in big environments

Problem

While rejoining the DC Backup systems (e.g. after pruning the translog on the primary), the join fails during 03univention-directory-listener.inst
In /var/log/univention/join.log you get the following:

27.10.22 02:21:20.613  LISTENER    ( ERROR   ) : cache_update_entry: storing entry in database failed: uid=lwiesehage,cn=schueler,cn=users,ou=dummy4,dc=testenv,dc=local
27.10.22 02:21:20.613  LISTENER    ( ERROR   ) : cache.c:469:cache_update_entry_in_transaction mdb_put: failed: MDB_MAP_FULL: Environment mapsize limit reached (-30792)

Environment

  • 1 primary, 2 backup nodes
  • About 15 GiB free space on every system (if not, do a cleanup first)
  • UCS 5.0, errata 393
  • UCS@School 5.0 v3 with 1.195 schools
  • 537.110 LDAP entries (counted with slapcat | grep -c ^dn:)

Tip: you can speed up the join by following:

Solution

1. Check MDB sizes

Please follow:

But be aware: during the re-join the UCR variables ldap/database/mdb/maxsize and listener/cache/mdb/maxsize are overwritten with the values of the DC primary! Either set them with ucr set --force or change them on the DC primary directly.

In this example, I set both of them to 7 GiB: ucr set ldap/database/mdb/maxsize=7516192768 listener/cache/mdb/maxsize=7516192768 and did a systemctl restart slapd.service univention-directiory-notifier.service. This can also be done via an UCR policy.

Note that you have to provide the required storage space for the now increased MDB sizes. Otherwise your cache gets corrupted again.

2. Check ldap/sizelimit

Because of Bug #34877 the DC primary per default only returns 400.000 LDAP entries. If you have more than that, the creation of the listener cache fails.

In my case, I set this to 1.000.000 on the DC primary by ucr set ldap/sizelimit=1000000 && systemctl restart slapd.service.

3. Check listener cache creation

The next problem, we stumbled upon, was the listener cache creation. The folder /usr/share/univention-group-membership-cache/caches grew larger & larger and there were a lot of temporary files which weren’t deleted.

The listener.log had the following error:

old={}
    new={'sambaGroupType': [b'2'], 'cn': [b'Domain Users schule230'], 'univentionObjectType': [b'groups/group'], 'sambaSID': [b'S-1-5-21-2631376095-4019860156-4166735050-23157'], 'gidNumber': [b'11078'], 'ucsschoolRole': [b'school_domain_group:school:schule230'], 'univentionGroupType': [b'-2147483646'], 'structuralObjectClass': [b'posixGroup'], 'entryUUID': [b'd309cda0-36ca-103b-8d94-e7320c8c29de'], 'creatorsName': [b'cn=admin,dc=testenv,dc=local'], 'createTimestamp': [b'20210421085345Z'], 'univentionPolicyReference': [b'cn=default-umc-users,cn=UMC,cn=policies,dc=testenv,dc=local'], 'objectClass': [b'posixGroup', b'univentionObject', b'sambaGroupMapping', b'top', b'univentionGroup', b'ucsschoolGroup', b'univentionPolicyReference', b'univentionSAMLEnabledGroup'], 'enabledServiceProviderIdentifierGroup': [b'SAMLServiceProviderIdentifier=https://webmail.dev-univention.de/appsuite/,cn=saml-serviceprovider,cn=univention,dc=testenv,dc=local'], 'entryCSN': [b'20210421161323.016812Z#000000#000#000000'], 'modifiersName': [b'cn=admin,dc=testenv,dc=local'], 'modifyTimestamp': [b'20210421161323Z'], 'entryDN': [b'cn=Domain Users schule230,cn=groups,ou=schule230,ddc=testenv,dc=local'], 'subschemaSubentry': [b'cn=Subschema'], 'hasSubordinates': [b'FALSE']}
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/univention/listener/api_adapter.py", line 166, in _handler
    self._module_handler.create(dn, new)
  File "/usr/lib/python3/dist-packages/univention/ldap_cache/listener_module.py", line 59, in create
    self._cleanup_cache_if_needed()
  File "/usr/lib/python3/dist-packages/univention/ldap_cache/listener_module.py", line 54, in _cleanup_cache_if_needed
    db.cleanup()
  File "/usr/lib/python3/dist-packages/univention/ldap_cache/cache/backend/gdbm_cache.py", line 112, in cleanup
    db.reorganize()
_gdbm.error: [Errno 1] Operation not permitted
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/univention/listener/api_adapter.py", line 169, in _handler
    self._module_handler.error_handler(dn, old, new, command, exc_type, exc_value, exc_traceback)
  File "/usr/lib/python3/dist-packages/univention/listener/handler.py", line 261, in error_handler
    reraise(exc_type, exc_value, exc_traceback)
  File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/univention/listener/api_adapter.py", line 166, in _handler
    self._module_handler.create(dn, new)
  File "/usr/lib/python3/dist-packages/univention/ldap_cache/listener_module.py", line 59, in create
    self._cleanup_cache_if_needed()
  File "/usr/lib/python3/dist-packages/univention/ldap_cache/listener_module.py", line 54, in _cleanup_cache_if_needed
    db.cleanup()
  File "/usr/lib/python3/dist-packages/univention/ldap_cache/cache/backend/gdbm_cache.py", line 112, in cleanup
    db.reorganize()
_gdbm.error: [Errno 1] Operation not permitted
28.10.22 14:27:19.629  LISTENER    ( WARN    ) : handler: ldap-cache-baa04df67e7af6bb0769f5cb7e72dba9 (failed)

Partly, this was caused by Bug #55286 but deleting the additional files didn’t help in the 1st place.

The cache clean-up is fixed with erratum 468.

A

ls -la /usr/share/univention-group-membership-cache/caches/{uniqueMembers.db,memberUids.db}
-rw-r----- 1 listener root 13889536 Nov  1 13:34 /usr/share/univention-group-membership-cache/caches/memberUids.db
-rw-r----- 1 listener root 56754176 Nov  1 13:34 /usr/share/univention-group-membership-cache/caches/uniqueMembers.db

revealed a wrong group permission of the two cache files!

This was fixed by

chgrp nogroup /usr/share/univention-group-membership-cache/caches/uniqueMembers.db
rm /usr/share/univention-group-membership-cache/caches/uniqueMembers.db.*
chgrp nogroup /usr/share/univention-group-membership-cache/caches/memberUids.db
rm /usr/share/univention-group-membership-cache/caches/memberUids.db.*

Finally, the Re-Join works :partying_face:

This topic was automatically closed after 7 hours. New replies are no longer allowed.

Mastodon