Joining UCS domain - trust relationshop errors


#1

We are rolling out domain controller using UCS and having some issues we cannot understand how to resolve. The issue is: “The security database on the server does not have a computer account for this workstation trust relationship.”

My domain setup is:
Main Controller
Backup DC
Version is 4.1-4 errata353

Installed clean version of Windows 7 64bit. Joined the domain 2 times, got successful message and offered restart. After the restart I was unable to login for the first time, computer was not added to domain computer list. I joined the domain for the 3rd time, then computer was finally added into domain, however I was unable to login again.

In attempt to debug, I have set few UCRs to:
notifier/debug/level set from 1 to 4
connector/debug/level set from 2 to 4

And went to monitor few log files.

sudo tail -f /var/log/univention/notifier.log
–did not give me any useful information

sudo tail -f /var/log/univention/listener.log
–showed process of updating the domain objects.

I started to suspect that the computer is at fault. This helped me a bit: since I get IP settings from a standalone DHCP server, I decided to set DNS IPs to Univention servers and that helped me to login for once, but now I cannot login, I get the same error on client about trust relationship.

I tried looking for info which logs to monitor, and I was also looking for diagrams how does this process work. I suspect that when I get error about trust relationship, the computer itself simply does not communicate with domain controller, so maybe issue is with computer not setting correct configuration? Because I was unable to see any info or activity on the domain controller. Just FYI it happens on more than one installation including fresh installations. Simply re-joining the domain would not help in this case because we keep having this issue all the time and we have yet to migrate nearly 100 of clients into domain.

My questions are:

  • what log files should I monitor to debug this error (and what UCRs to set)
  • what is the process of domain join and login
  • why does “trust relationship” happen, what factors influence computer and server to break trust.
  • how can I see any “login” activity from the server side? auth.log was not very informative so far.

Removed computers do not appear in UCS hovewer still exist
#2

Hi
I think you should update UCS to the latest errata379 as today
there where problems with the s4 connector by an earlier errata level (don’t know exactly but it was between 347 and 366 i think)
then client domain join should work well

rg
Christian


#3

Yes, definitely need to keep up to date system, however if I could monitor or debug the process would be a big help.

I would not be sure that there is a 100% chance updates will fix my issues, so there is a high chance there is something else broken in my scenario.


#4

Ok just an update. Updated domain controllers to latest version, left domain and joined again to fix “trust relationship” error. My computer would not appear in computers list however it works. I definitely want to debug joining the domain and authenticating etc.


#5

I think most of the issues should be fixed in the latest errata. Does this work now? Can you create and join a client with a new name to see if the computer appears in the computer list and you have no issues?


#6

what I did was login into both server and:
sudo tail -f /var/log/univention/*.log | grep COMPUTERNAME
sudo tail -f /var/log/syslog | grep COMPUTERNAME

That is the best I could come up with. Even though there was not much information I seen.

As no one can point me where should I look at, I plan to find resources and set up some log management software to collect and correlate logs and have some more presentable and manageable system for monitoring activity.

I was able to join with two computers but then one of the computers which was joined few weeks ago failed to authenticate and had “trust relationship” error.

I cannot guarantee this is fixed at the moment, I guess I will have to keep trying and see when it breaks again.


#7
sudo tail -f /var/log/univention/*.log | grep COMPUTERNAME

should give you most information. You can narrow it down at the “/var/log/univention/connectors4.log” and the “/var/log/univention/listener.log” - both should give you information on the client. The issues we fixed in the latest errata all boiled down to fixes at the join of clients, so that may not be helpful for already joined clients. You can try to reset the Computeraccount on the Windows Client via this method: http://www.networknet.nl/apps/wp/archives/1938


#8

Hi again. I was able to narrow down the issue to the log below from /var/log/samba/log.smbd:

[2017/02/03 15:13:03.786831,  2, pid=5986] ../source4/auth/ntlm/auth.c:430(auth_check_password_recv)
  auth_check_password_recv: sam_ignoredomain authentication for user [OFFICEPOOLLAPTOPD$] FAILED with error NT_STATUS_INTERNAL_DB_CORRUPTION
[2017/02/03 15:13:03.787184,  2, pid=5986] ../auth/gensec/spnego.c:720(gensec_spnego_server_negTokenTarg)
  SPNEGO login failed: NT_STATUS_INTERNAL_DB_CORRUPTION

I cannot find enough information on internet, and if I search for “spnego” I mostly see information about old versions of samba or Windows (such as XP and 2000, we have none of these).

Hence I cannot login with this machine onto network and I get same “trust relationship” error. Maybe this is the cause.

To quickly diagnose I have run the command:

samba-tool dbcheck

… but it did not return any errors.


Edit:
Also if it helps, in /var/log/samba/log.samba I have error

[2017/02/03 10:56:57.895698, 0, pid=5393] ../source4/rpc_server/netlogon/dcerpc_netlogon.c:375(dcesrv_netr_ServerAuthenticate3) Found 2 records matching user [POOLLAPTOPD$]

Edit2: Not sure why it says found 2 records matching user “POOLLAPTOPD” as I only have one in UCS, unless it resolves to another some sort of ID on the system level where it has two records, how do I check?


And also noted that /var/log/samba/log.samba reports same error as log.smbd from above in this post, for some reason it duplicates errors?


#9

Followed by this error, I would like to ask second question about SPNEGO.

UCR Variable: samba/use_spnego
Value: yes

Since my search results contained words such as Windows 2000 or XP I wonder if SPNEGO is necessary in environments with at least Windows 7 or up-to-date Linux distributions? Maybe disabling use_spnego could somehow help with “trust relationship” errors?

If I understand it right, SPNEGO is used to determine which authentication protocol to use, i.e. kerberos? What else could it be?


#10

I will post my finding here hoping it will help somehow.

With ldapsearch command I found few computers one of which we have issues with.

# POOLLAPTOPD$, uid, temporary, univention, <domain>
dn: cn=POOLLAPTOPD$,cn=uid,cn=temporary,cn=univention,dc=<domain>
objectClass: lock
objectClass: top
lockTime: 1484911694
cn: POOLLAPTOPD$

What does objectClass: lock and lockTime: 1484911694 mean and is there a way to remove the lock to enable the computer?


#11

I would start by correcting this (That may also correct your other problems with the computers):

It may be, that your S4 Database is corrupt or broken. You can try the following (but be advised, that you may need to rejoin clients and you definatly need to rejoin other servers - this way is very impactful): http://sdb.univention.de/content/6/274/en/re_provisioning-samba4-on-a-dc-master.html - You will rebuilt your Samba from the LDAP again this way.


#12

[quote=“Roman”]I will post my finding here hoping it will help somehow.

With ldapsearch command I found few computers one of which we have issues with.

# POOLLAPTOPD$, uid, temporary, univention, <domain>
dn: cn=POOLLAPTOPD$,cn=uid,cn=temporary,cn=univention,dc=<domain>
objectClass: lock
objectClass: top
lockTime: 1484911694
cn: POOLLAPTOPD$

What does objectClass: lock and lockTime: 1484911694 mean and is there a way to remove the lock to enable the computer?[/quote]

Sorry for the confusion, my bad.

I did top level search before posting and found two entries for affected computer.

# ldapsearch | grep POOLLAPTOPD
SASL/GSSAPI authentication started
SASL username: <truncated>
SASL SSF: 56
SASL data security layer installed.
# POOLLAPTOPD, <domain>, dns, <domain>
dn: relativeDomainName=POOLLAPTOPD,zoneName=<domain>,cn=dns,dc=<truncated>
relativeDomainName: POOLLAPTOPD
# POOLLAPTOPD$, uid, temporary, univention, <domain>
dn: cn=POOLLAPTOPD$,cn=uid,cn=temporary,cn=univention,dc=<truncated>
cn: POOLLAPTOPD$

#13

[quote=“Thorp-Hansen”]I would start by correcting this (That may also correct your other problems with the computers):

It may be, that your S4 Database is corrupt or broken. You can try the following (but be advised, that you may need to rejoin clients and you definatly need to rejoin other servers - this way is very impactful): http://sdb.univention.de/content/6/274/en/re_provisioning-samba4-on-a-dc-master.html - You will rebuilt your Samba from the LDAP again this way.[/quote]

Thank you but this sounds inconvenient, I would rather leave it as a last resort or live with it. Is there another way? For example to check records from both systems (assuming openLdap and Samba4 are separate systems talking to each other), and then we could clean up what does not exist or we experiencing problems with hoping the corruption will go away?


#14

I honestly wouldn’t know where to start and if the corruption would simply “go away”. That strikes me as patchwork fixing, I am unsure how reliable that can be. The other way is inconvenient but at least complete.


#15

Hi Thorp-Hansen, Thanks for your help, I am Roman’s colleague.

Before we go ahead and reprovision the S4 database, I would like to share some more debugging which might be helpful in finding out how this corruption happened.

connector-s4.log shows sync failures, here is a traceback:

06.02.2017 14:35:45,736 LDAP        (PROCESS): sync to ucs: Resync rejected dn: CN=Domain Computers,CN=Groups,DC=office,DC=ourdomain,DC=co,DC=uk
06.02.2017 14:35:45,743 LDAP        (PROCESS): sync to ucs:   [         group] [    modify] cn=domain computers,cn=groups,dc=office,dc=ourdomain,dc=co,dc=uk
06.02.2017 14:35:45,903 LDAP      

  (ERROR  ): failed in post_con_modify_functions
06.02.2017 14:35:45,903 LDAP        (ERROR  ): Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/univention/s4connector/__init__.py", line 1505, in sync_to_ucs
    f(self, property_type, object)
  File "/usr/lib/pymodules/python2.7/univention/s4connector/s4/__init__.py", line 85, in group_members_sync_to_ucs
    return s4connector.group_members_sync_to_ucs(key, object)
  File "/usr/lib/pymodules/python2.7/univention/s4connector/s4/__init__.py", line 1981, in group_members_sync_to_ucs
    ucs_admin_object.fast_member_add(uniqueMember_add, memberUid_add)
  File "/usr/lib/pymodules/python2.7/univention/admin/handlers/groups/group.py", line 436, in fast_member_add
    return self.lo.modify(self.dn, ml)
  File "/usr/lib/pymodules/python2.7/univention/admin/uldap.py", line 471, in modify
    raise univention.admin.uexceptions.ldapError(_err2str(msg), original_exception=msg)
ldapError: Type or value exists: memberUid: value #0 provided more than once

It seems this is due to upper / lower case inconsistencies

root@controller:~# univention-s4connector-list-rejected

UCS rejected

S4 rejected

    1:    S4 DN: CN=Domain Computers,CN=Groups,DC=office,DC=ourdomain,DC=co,DC=uk
         UCS DN: cn=domain computers,cn=groups,dc=office,dc=ourdomain,dc=co,dc=uk

        last synced USN: 121324

which were perhaps caused by this bug
forge.univention.org/bugzilla/s … i?id=43247

and fixed in this patch
errata.software-univention.de/ucs/4.1/367.html

which we have already installed.
The currently installed release version is 4.1-4 errata380

Do you think it is worth trying to manually get rid of this S4 reject using this procedure
sdb.univention.de/content/6/294/ … jects.html

before we hit the nuclear button and reprovision samba?

thanks, Julian


#16

Yes, the bug you mentioned is right and should be fixed in the errata. You have to understand one thing about rejects: If you follow the SDB article and just delete the reject (instead of fixing the base problem) the reject will disappear at first BUT reappear if you touch/modify the rejected object. So you have won nothing. Normally the reject would disappear if the root-cause is fixed. Additionally, from the given informations I would think that just fixing this one reject has little impact on clients loosing their trust relationship. That may happen if there are problems with SambaSIDs, etc. and may go very deep.

Hm, okay I think that fixing the reject will not break something so please go ahead and do that first. I think you might rejoin some of the clients either way - foremost the clients that lost their trust relationship before the errata update that fixed the mentioned problem.


#17

Hi,

I removed that rejected item and s4connector started working again. I can add windows hosts to the domain now.

I tried to re-join the two hosts that had lost their trust relationship, I got this in the s4connector log
Ignore conflicted object: CN=PoolLaptopDACNF:cc159001-70a8-44c2-a238-89bf4ce45b79,CN=Computers,DC=office,DC=ourdomain,DC=co,DC=uk

and this in samba log

[2017/02/07 10:09:31.730017,  0, pid=17348] ../source3/auth/pampass.c:89(smb_pam_error_handler)
  smb_pam_error_handler: PAM: session setup failed : User not known to the underlying authentication module
[2017/02/07 10:09:31.733217,  1, pid=17348] ../source3/smbd/session.c:70(session_claim)
  pam_session rejected the session for OFFICE+POOLLAPTOPD$ [smb/3143471839]
[2017/02/07 10:09:31.733269,  1, pid=17348] ../source3/smbd/smb2_sesssetup.c:462(smbd_smb2_auth_generic_return)
  smb2: Failed to claim session for vuid=3143471839
[2017/02/07 10:09:31.848887,  0, pid=17346] ../source3/auth/pampass.c:89(smb_pam_error_handler)
  smb_pam_error_handler: PAM: session setup failed : User not known to the underlying authentication module
[2017/02/07 10:09:31.850506,  1, pid=17346] ../source3/smbd/session.c:70(session_claim)
  pam_session rejected the session for OFFICE+POOLLAPTOPD$ [smb/3143471839]
[2017/02/07 10:09:31.850534,  1, pid=17346] ../source3/smbd/smb2_sesssetup.c:462(smbd_smb2_auth_generic_return)
  smb2: Failed to claim session for vuid=3143471839
[2017/02/07 10:09:31.850761,  0, pid=17346] ../source3/smbd/smbXsrv_session.c:1675(smbXsrv_session_logoff)
  smbXsrv_session_logoff(0xbb5d92df): failed to delete global key 'BB5D92DF': NT_STATUS_NOT_FOUND
[2017/02/07 10:09:31.858035,  0, pid=17346] ../source3/smbd/smbXsrv_session.c:1775(smbXsrv_session_logoff_all)
  smbXsrv_session_logoff_all: count[1] errors[1] first[NT_STATUS_NOT_FOUND]
[2017/02/07 10:09:31.858071,  0, pid=17346] ../source3/smbd/server_exit.c:159(exit_server_common)
  Server exit (NT_STATUS_CONNECTION_DISCONNECTED)
[2017/02/07 10:09:31.858091,  0, pid=17346] ../source3/smbd/server_exit.c:162(exit_server_common)
  exit_server_common: smbXsrv_session_logoff_all() failed (NT_STATUS_NOT_FOUND) - triggering cleanup
[2017/02/07 10:09:31.859557,  0, pid=17346] ../source3/lib/util.c:791(smb_panic_s3)
  PANIC (pid 17346): smbXsrv_session_logoff_all failed
[2017/02/07 10:09:31.860757,  0, pid=17346] ../source3/lib/util.c:902(log_stack_trace)
  BACKTRACE: 25 stack frames:
   #0 /usr/lib/x86_64-linux-gnu/libsmbconf.so.0(log_stack_trace+0x1a) [0x7fc606a7e3ea]
   #1 /usr/lib/x86_64-linux-gnu/libsmbconf.so.0(smb_panic_s3+0x20) [0x7fc606a7e4c0]
   #2 /usr/lib/x86_64-linux-gnu/libsamba-util.so.0(smb_panic+0x2f) [0x7fc608f9f68f]
   #3 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(+0x170698) [0x7fc608bab698]
   #4 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(+0x1709be) [0x7fc608bab9be]
   #5 /usr/lib/x86_64-linux-gnu/samba/libsmbd-shim.so.0(exit_server_cleanly+0x12) [0x7fc60643dd42]
   #6 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(+0x14a8ee) [0x7fc608b858ee]
   #7 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(+0x15211b) [0x7fc608b8d11b]
   #8 /usr/lib/x86_64-linux-gnu/libtevent.so.0(_tevent_req_error+0x22) [0x7fc6054ba8b2]
   #9 /usr/lib/x86_64-linux-gnu/libtevent.so.0(tevent_common_loop_immediate+0xe8) [0x7fc6054b9f78]
   #10 /usr/lib/x86_64-linux-gnu/libtevent.so.0(+0xb300) [0x7fc6054bf300]
   #11 /usr/lib/x86_64-linux-gnu/libtevent.so.0(+0x9936) [0x7fc6054bd936]
   #12 /usr/lib/x86_64-linux-gnu/libtevent.so.0(_tevent_loop_once+0xb5) [0x7fc6054b94e5]
   #13 /usr/lib/x86_64-linux-gnu/libtevent.so.0(tevent_common_loop_wait+0x27) [0x7fc6054b9757]
   #14 /usr/lib/x86_64-linux-gnu/libtevent.so.0(+0x98a6) [0x7fc6054bd8a6]
   #15 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(smbd_process+0x712) [0x7fc608b767e2]
   #16 /usr/sbin/smbd(+0xbd94) [0x55ff8fa40d94]
   #17 /usr/lib/x86_64-linux-gnu/libtevent.so.0(+0xb56b) [0x7fc6054bf56b]
   #18 /usr/lib/x86_64-linux-gnu/libtevent.so.0(+0x9936) [0x7fc6054bd936]
   #19 /usr/lib/x86_64-linux-gnu/libtevent.so.0(_tevent_loop_once+0xb5) [0x7fc6054b94e5]
   #20 /usr/lib/x86_64-linux-gnu/libtevent.so.0(tevent_common_loop_wait+0x27) [0x7fc6054b9757]
   #21 /usr/lib/x86_64-linux-gnu/libtevent.so.0(+0x98a6) [0x7fc6054bd8a6]
   #22 /usr/sbin/smbd(main+0x148b) [0x55ff8fa3d47b]
   #23 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7fc605145ead]
   #24 /usr/sbin/smbd(+0x8819) [0x55ff8fa3d819]
[2017/02/07 10:09:31.860931,  0, pid=17346] ../source3/lib/dumpcore.c:303(dump_core)
  dumping core in /var/log/samba/cores/smbd
[2017/02/07 10:09:31.925768,  1, pid=14521] ../source3/smbd/server.c:868(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown

If I change the name of those hosts, they can join the domain and they work fine.

We are tempted to leave it like this as we have users connected via vpn, If the re_provisioning-samba4-on-a-dc-master procedure results in us having to rejoin them to the domain, will they will be locked out of their machines? They don’t have local administrator accounts.

thanks for your help, Julian


#18

I hate to bump, but I thought the issue with case sensitive sync was fixed. Seems not:

/var/log/univention/connector-s4.log

26.02.2017 06:25:16,731 LDAP        (PROCESS): sync to ucs: Resync rejected dn: CN=Domain Computers,CN=Groups,DC=domain,DC=com
26.02.2017 06:25:16,748 LDAP        (PROCESS): sync to ucs:   [         group] [    modify] cn=domain computers,cn=groups,dc=domain,dc=com

#19

[quote=“Roman”]I hate to bump, but I thought the issue with case sensitive sync was fixed. Seems not:

/var/log/univention/connector-s4.log

26.02.2017 06:25:16,731 LDAP (PROCESS): sync to ucs: Resync rejected dn: CN=Domain Computers,CN=Groups,DC=domain,DC=com 26.02.2017 06:25:16,748 LDAP (PROCESS): sync to ucs: [ group] [ modify] cn=domain computers,cn=groups,dc=domain,dc=com [/quote]

Do you have error messages in the connector-s4.log? Or do you still have rejected objects? Try univention-s4connector-list-rejected


#20

[quote=“Gohmann”]
Do you have error messages in the connector-s4.log? Or do you still have rejected objects? Try univention-s4connector-list-rejected[/quote]

Thank you for the command. It shows rejects for the PC we most likely added manually in UCS because it was not being added automatically.

Shows 3 records for the same computer. The computer by the way does not appear in UCS UI.

If you are interested have a look at the sample:

UCS rejected

    1:   UCS DN: cn=COMPUTERNAME,cn=computers,dc=domain,dc=com
          S4 DN: cn=computername,cn=computers,DC=domain,DC=com
         Filename: /var/lib/univention-connector/s4/1487960072.026707

    2:   UCS DN: cn=COMPUTERNAME,cn=computers,dc=domain,dc=com
          S4 DN: cn=computername,cn=computers,DC=domain,DC=com
         Filename: /var/lib/univention-connector/s4/1487960072.045497

    3:   UCS DN: cn=COMPUTERNAME,cn=computers,dc=domain,dc=com
          S4 DN: cn=computername,cn=computers,DC=domain,DC=com
         Filename: /var/lib/univention-connector/s4/1487960091.208594
S4 rejected


	last synced USN: 156605