Replication error between UCS master and Windows AD DC


#1

I’m going to remove the remain Windows AD Domain Controller after migrating to UCS successfully. But I several days ago, I found that when I created a domain account on UCS, this account did not appear on the Windows DC. I tried to create an account on Windows DC and it appeared on UCS Master automatically. So there’s something wrong with the UCS master and Windows DC.

I’ve checked both on UCS and Windows AD. Below is output of some command on both servers:

  • UCS: (it seems to be OK)

[code]root@xxx-ucs-298:~# samba-tool drs showrepl
mydomain\xxx-UCS-298
DSA Options: 0x00000001
DSA object GUID: d2254268-156b-4a24-9c6d-6e2939dce9cd
DSA invocationId: 37f89255-2d7c-4865-9fa8-5fdd72fe92cd

==== INBOUND NEIGHBORS ====

CN=Configuration,DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 01:56:47 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 01:56:47 2015 ICT

DC=ForestDnsZones,DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 01:56:47 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 01:56:47 2015 ICT

DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 01:56:48 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 01:56:48 2015 ICT

CN=Schema,CN=Configuration,DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 01:56:48 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 01:56:48 2015 ICT

DC=DomainDnsZones,DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 01:56:47 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 01:56:47 2015 ICT

==== OUTBOUND NEIGHBORS ====

CN=Configuration,DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 00:24:32 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 00:24:32 2015 ICT

DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 00:30:58 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 00:30:58 2015 ICT

CN=Schema,CN=Configuration,DC=mydomain,DC=com
mydomain\Windows-DC via RPC
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
Last attempt @ Sun Jul 26 00:25:17 2015 ICT was successful
0 consecutive failure(s).
Last success @ Sun Jul 26 00:25:17 2015 ICT

==== KCC CONNECTION OBJECTS ====

Connection –
Connection name: 602198b2-2226-4fa0-be94-1ad73479c0c6
Enabled : TRUE
Server DNS name : Windows-DC.mydomain.com
Server DN name : CN=NTDS Settings,CN=Windows-DC,CN=Servers,CN=mydomain,CN=Sites,CN=Configuration,DC=mydomain,DC=com
TransportType: RPC
options: 0x00000001[/code]

  • Windows DC:

[code]C:\Users\abc>REPADMIN /SHOWREPS
mydomain\Windows-DC
DSA Options: IS_GC
Site Options: (none)
DSA object GUID: a5178e73-a552-4678-ae81-20299caac891
DSA invocationID: 3a8e4b23-7949-462b-84cd-a1fc08671446

==== INBOUND NEIGHBORS ======================================

DC=mydomain,DC=com
mydomain\xxx-UCS-298 via RPC
DSA object GUID: d2254268-156b-4a24-9c6d-6e2939dce9cd
Last attempt @ 2015-07-26 01:50:21 was delayed for a normal reason, result 8418 (0x20e2):
The replication operation failed because of a schema mismatch between the servers involved.
Last success @ 2015-07-20 19:46:14.

CN=Configuration,DC=mydomain,DC=com
mydomain\xxx-UCS-298 via RPC
DSA object GUID: d2254268-156b-4a24-9c6d-6e2939dce9cd
Last attempt @ 2015-07-26 01:50:21 was successful.

CN=Schema,CN=Configuration,DC=mydomain,DC=com
mydomain\xxx-UCS-298 via RPC
DSA object GUID: d2254268-156b-4a24-9c6d-6e2939dce9cd
Last attempt @ 2015-07-26 01:50:22 was successful.[/code]

I found the message “Last success @ 2015-07-20 19:46:14”, this was the time we have not finished the takeover process due to an error (has been fixed after running patch script from UCS support).

Please help me to solve this. Thank you.


New Host not found in DNS
#2

For AD Takeover UCS configures replication to run only unidirectional from the AD DC to the Samba DC. This is normal. It is expected that you turn off the AD DC after the takeover. The reason we do this is to avoid any accidental modifiations on the AD DC, so in case everything fails, you could theoretically still can go back, switch off the UCS server and turn on your AD DC again.

But since you say that the migration was successful in the end (“has been fixed after running patch script from UCS support”) I would suggest that you switch off the AD DC.

Permanent operation of a native Microsoft AD DC joined to a Samba DC is currently not supported in UCS because we cannot guarantee that the GPO objects are replicated between AD DC and Samba DC and this would cause problems for GPO based administration.


#3

For AD Takeover UCS configures replication to run only unidirectional from the AD DC to the Samba DC. This is normal. It is expected that you turn off the AD DC after the takeover. The reason we do this is to avoid any accidental modifiations on the AD DC, so in case everything fails, you could theoretically still can go back, switch off the UCS server and turn on your AD DC again.

But since you say that the migration was successful in the end (“has been fixed after running patch script from UCS support”) I would suggest that you switch off the AD DC.

Permanent operation of a native Microsoft AD DC joined to a Samba DC is currently not supported in UCS because we cannot guarantee that the GPO objects are replicated between AD DC and Samba DC and this would cause problems for GPO based administration.[/quote]

Thank you for your answer, Arvid. Honestly, we want to turn off the Windows AD as the instruction from documentation but we got an error at the end of takeover process as you know and we still got some problems with UCS DNS so we cannot turn the Windows AD off now. Our organization is working 24/7 and we have some services running depend on AD. For example, we setup a new UCS and join UCS domain as MEMBER because this UCS will be a file server (replace the current one) but after the successful join, both UCS master and every member of domain cannot resolve this new member name via PING command. We’ve checked the DNS of UCS:

  • DNS has host record of this new UCS
  • DNS has pointer record of this new UCS
    About LDAP:
  • the new UCS is located at: com.domain\computers\memberserver with “Member Server” type

Furthermore, does UCS have any tools or command to check the health of domain controller like DCDIAG command of Windows AD?


#4

First of all you should note that AD Takeover finally assigns the IP of the AD DC to a virtual network interface of the UCS DC. So, if you keep the AD DC running after the takeover, you will have to deal with an IP conflict. Whatever then happens is out of scope of recommended product use. What’s more: AD Takeover also configures the hostname of the old AD DC as a DNS and NETBIOS alias for the UCS DC. So, basically, if you keep the AD DC running in that situation, pretty much nothing will work as expected.

If you have a DNS problem with the UCS DC after takeover, then that should be identified, analyzed and solved. (1.) Who has the problem resolving (2.) which record (3.) consulting which DNS server.

Please note that the AD Takeover has some complexity and it is recommended to first test the procedure e.g. in a virtual clone of the essential components your productive AD domain before actually attempting to do this in the actual productive exnvironment. Since you are saying that you are running 24/7, you may want to consider to step back and stop this “experiment” at this point, turn of the UCS DC, and start the AD DC again. From there on, you can set up a proper test environement and identify possible problems. If you recall your the initial AD Takeover issue – where the takeover stopped in the final stage due to a python traceback (which we regret and will fix eventually) – that issue was probably caused by having configured the UCS server as a AD Memberserver before that. In a clean setup you would probably not even run into that problem.

So, to repeat clearly, you have two options at this point:

  1. In case you go that way and decide to revert the Takeover at this point let me give you one recommendation: When everything is back to normal with your old AD DC (and the current UCS DC master has been removed), you should also go and run “Active Directory Users and Computers” on the AD DC and look for a DC account by the name of the UCS DC. That acount was created during the initial join for the AD Takeover and it should be removed before starting a new takeover attempt. Also, there might be a DNS record _domaincontroller_master._tcp in the AD DNS, which is UCS specific and was probably created during the previous UCS AD Memberserver setup experiment. That DNS record should be removed as well (just to be safe).

  2. In case you want/need to continue with the UCS DC master in it’s current shape, then you need to track down the specific DNS problems. There is no healthy way to have UCS DC master and AD DC running in parallel in this situation.


#5

[quote=“requate”]First of all you should note that AD Takeover finally assigns the IP of the AD DC to a virtual network interface of the UCS DC. So, if you keep the AD DC running after the takeover, you will have to deal with an IP conflict. Whatever then happens is out of scope of recommended product use. What’s more: AD Takeover also configures the hostname of the old AD DC as a DNS and NETBIOS alias for the UCS DC. So, basically, if you keep the AD DC running in that situation, pretty much nothing will work as expected.

I turned off all Windows AD Domain Controller when I executed the script to finish the AD takeover process. I know that The UCS master takeover both IP address and NETBIOS name of primary Windows AD so I forget it as instruction. I think you misunderstood what I mean. The old primary AD (Windows based) has been turned off from the time we finish AD takeover. The current Windows AD Domain Controller which is running together with UCS master, is a backup AD DC, it’s also a file server (legacy from old IT Administrator). I tried to demote it to be an AD member before by dcpromo command but there was an error which cannot continue to finish the process.

If you have a DNS problem with the UCS DC after takeover, then that should be identified, analyzed and solved. (1.) Who has the problem resolving (2.) which record (3.) consulting which DNS server.

Please note that the AD Takeover has some complexity and it is recommended to first test the procedure e.g. in a virtual clone of the essential components your productive AD domain before actually attempting to do this in the actual productive exnvironment. Since you are saying that you are running 24/7, you may want to consider to step back and stop this “experiment” at this point, turn of the UCS DC, and start the AD DC again. From there on, you can set up a proper test environement and identify possible problems. If you recall your the initial AD Takeover issue – where the takeover stopped in the final stage due to a python traceback (which we regret and will fix eventually) – that issue was probably caused by having configured the UCS server as a AD Memberserver before that. In a clean setup you would probably not even run into that problem.

Honestly, I had some test in lab environment before migrating to UCS environment and everything was fine. Then I migrated to UCS as you know.

So, to repeat clearly, you have two options at this point:

  1. In case you go that way and decide to revert the Takeover at this point let me give you one recommendation: When everything is back to normal with your old AD DC (and the current UCS DC master has been removed), you should also go and run “Active Directory Users and Computers” on the AD DC and look for a DC account by the name of the UCS DC. That acount was created during the initial join for the AD Takeover and it should be removed before starting a new takeover attempt. Also, there might be a DNS record _domaincontroller_master._tcp in the AD DNS, which is UCS specific and was probably created during the previous UCS AD Memberserver setup experiment. That DNS record should be removed as well (just to be safe).

  2. In case you want/need to continue with the UCS DC master in it’s current shape, then you need to track down the specific DNS problems. There is no healthy way to have UCS DC master and AD DC running in parallel in this situation.[/quote]

Please have a look at the answer with color in quote and thank you for your suggestion. In fact, I spent several hours to find out which object has been created after an UCS server join to a Windows AD domain, even after takeover process. I know which objects both on DNS and AD Computers and Users need to be removed to start the takeover process again. I will try to find out the reason of current DNS issue instead of coming back to a primary Windows AD Domain Controller as before. After discovering the reason, my next step is creating a new UCS server and join it into UCS domain to be an UCS backup DC, and then transfering all data of current file server to the new UCS (I have mentioned before to be new file server) and this UCS will be file server instead of the current one. The final step is removing the last Windows AD DC from domain. Hope that I can finish the migration process from all Windows based to UCS based a.s.a.p. Thank you again.

Appreciate for your help!

Regards,


#6

Ok, I understand. In case you have a question regarding the DNS issue feel free to ask again in case you suspect that it is UCS related.