Found a new issue after upgrade to 4.2-1.
This one seems related to DNS resolution. DNS information in the OpenLDAP directory appears to be correct, however reverse lookup zones in samba4 DNS under RSAT show badly formatted records (nonsense extra octet in IP address - which confuses RSAT). Assuming came with upgrade (will check logs) as I wouldn’t think RSAT input validation would allow a bad IP entry and we haven’t edited any entries after upgrade.
Additionally these PTR records are not being synced by the s4-connector.
cat /var/log/univention/connector-s4-status.log
Mon Jul 10 19:04:10 2017
--------------------------------------
try to sync 0 changes from UCS
done:
Changes from UCS: 0 (0 saved rejected)
--------------------------------------
--------------------------------------
try to sync 0 changes from S4
done:
Changes from S4: 0 (8 saved rejected)
--------------------------------------
--------------------------------------
Sync 0 rejected changes from UCS
restored 0 rejected changes
--------------------------------------
--------------------------------------
Sync 8 rejected changes from S4 to UCS
restored 0 rejected changes
--------------------------------------
- sleep 5 seconds (0/10 until resync) -
univention-s4connector-list-rejected
UCS rejected
S4 rejected
1: S4 DN: DC=11,DC=20.20.10.in-addr.arpa,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: relativedomainname=11,zonename=20.20.10.in-addr.arpa,cn=dns,DC=removedrealdomain,dc=com,dc=au
2: S4 DN: DC=@,DC=30.10.in-addr.arpa,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: zonename=30.10.in-addr.arpa,cn=dns,DC=removedrealdomain,dc=com,dc=au
3: S4 DN: DC=@,DC=40.10.in-addr.arpa,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: zonename=40.10.in-addr.arpa,cn=dns,DC=removedrealdomain,dc=com,dc=au
4: S4 DN: DC=@,DC=anotherreplaceddomain,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: zonename=anotherreplaceddomain,cn=dns,DC=removedrealdomain,dc=com,dc=au
5: S4 DN: DC=@,DC=display.anotherreplaceddomain.com.au,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: zonename=display.anotherreplaceddomain.com.au,cn=dns,DC=removedrealdomain,dc=com,dc=au
6: S4 DN: DC=@,DC=testing.anotherreplaceddomain.com.au,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: zonename=testing.anotherreplaceddomain.com.au,cn=dns,DC=removedrealdomain,dc=com,dc=au
7: S4 DN: DC=7,DC=20.20.10.in-addr.arpa,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: <not found>
8: S4 DN: DC=7,DC=20.20.10.in-addr.arpa,CN=MicrosoftDNS,DC=DomainDnsZones,DC=removedrealdomain,DC=com,DC=au
UCS DN: <not found>
last synced USN: 299387
samba-tool dns query seems OK?
# samba-tool dns query dcmaster 20.10.in-addr.arpa 20 ALL -U administrator
Password for [administrator]:
Name=, Records=0, Children=0
Name=10, Records=1, Children=0
PTR: dcmaster.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=11, Records=0, Children=0
Name=12, Records=0, Children=0
Name=13, Records=0, Children=0
Name=14, Records=0, Children=0
Name=15, Records=0, Children=0
Name=17, Records=0, Children=0
Name=18, Records=0, Children=0
Name=19, Records=0, Children=0
Name=2, Records=0, Children=0
Name=20, Records=0, Children=0
Name=21, Records=0, Children=0
Name=22, Records=0, Children=0
Name=23, Records=0, Children=0
Name=24, Records=0, Children=0
Name=25, Records=0, Children=0
Name=26, Records=0, Children=0
Name=27, Records=0, Children=0
Name=28, Records=0, Children=0
Name=29, Records=1, Children=0
PTR: device1.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=30, Records=1, Children=0
PTR: device2.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=32, Records=1, Children=0
PTR: device3.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=33, Records=1, Children=0
PTR: device4.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=34, Records=1, Children=0
PTR: device5.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=35, Records=1, Children=0
PTR: device6.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=36, Records=1, Children=0
PTR: device7.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=37, Records=1, Children=0
PTR: device8.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=38, Records=1, Children=0
PTR: device9.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=39, Records=1, Children=0
PTR: device10.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=5, Records=0, Children=0
Name=6, Records=1, Children=0
PTR: device11.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
Name=7, Records=1, Children=0
PTR: device12.replaceddomain.com.au (flags=f0, serial=1, ttl=900)
nslookup from linux and windows give same result
nslookup 10.20.20.6
Server: 10.20.20.10
Address: 10.20.20.10#53
** server can't find 6.20.20.10.in-addr.arpa: NXDOMAIN
Forward lookups work though.
So it seems the Samba4 has had some entries corrupted or synced badly and now it and Openldap/bind are out of sync.
How do I manually edit one side to match the other to allow sync to re-establish? Ideally I’d like to accept the Openldap reverse zone as accurate and overwrite the samba4 one.
One side effect of the above is that I think this has affected the univention-mount-homedir script that runs on a 10 min cronjob (and via pam common-session I think). Every time it runs it seems it failed on resolving the DNS name of the fileserver and hung the mount.nfs call in ‘D’ process state as it can’t resolve the PTR records for the fileserver.
This manifested as high load avg numbers (200+) with low CPU usage and built up over the weekend after the upgrade until all memory and swap space ran out this morning and the server went completely unresponsive. Additionally I think the OOM killer shutdown named and other UCS services rather than the hundreds of mount.nfs in ‘D’ state so that everything gradually stopped working.
Since rebooting both DCs I’ve made sure the process trees launching that script don’t execute and the graphs have returned to normal. I haven’t yet worked out what part of UCS is causing user homedirs to be mounted on the DCs? I don’t have any NFS mount policies and don’t really understand why user homedirs would need to be mounted on a DC.
So we’re limping along with S4 DNS in a degraded state but would really like to sort this out in case something like kerberos timeouts or machine password renewals or some other active directory function that needs good DNS starts to go wrong.
Hope someone can help.