Possible bug in AD sync connector

it seems if we do this:

then after the connection errors it is reformed.

[quote] self.open_ucs()
[/quote]
DOES NOT work if used here

Yes, you will need to reapply it, if the ad-connector is updated.

Neither of those commands seems to completely close the AD connection on 389.

This is VERY annoying


If we go into the firewall and physically break the connection to the windows AD destination port of 389,

Then the Univention, recovers, reforms the connection and processes the outstanding items in the synchronisation queue.(until it breaks again)

If we modify the program code using those commands then the program starts throwing errors in other places and does not recover gracefully.

clearly there is a difference, the code solution is not reliable

Hi talleyrand,

I’ve tried to understand your issue. If I read it correctly, the main problem is the following traceback:

File "/usr/lib/pymodules/python2.7/univention/connector/ad/password.py", line 381, in password_sync res = get_password_from_ad(connector, univention.connector.ad.compatible_modstring(object['dn'])) File "/usr/lib/pymodules/python2.7/univention/connector/ad/password.py", line 180, in get_password_from_ad (level, ctr) = connector.drs.DsGetNCChanges(connector.drsuapi_handle, 8, req8) NTSTATUSError: (-1073741300, 'The transport connection is now disconnected.')

In /usr/lib/pymodules/python2.7/univention/connector/ad/password.py, the connector does something like this:

    if not connector.drs:
        connector.open_drs_connection()
    [...]
    while True:
        (level, ctr) = connector.drs.DsGetNCChanges(connector.drsuapi_handle, 8, req8)

The function connector.drs.DsGetNCChanges throws the traceback. So, maybe “if not connector.drs:” is not enough.

What I don’t understand is why the connector stops working at this point. Can you send me the connector.log and the connector-status.log if the problem occurs.

Thanks,
Stefan

1 Like

Where can i upload them?

they contain too much sensitive information

Try upload.univention.de/ and send me the upload ID.

Ok

upload_ELzspW.zip

in the file: connector.log
find the line "01.03.2017 13:04:27,672 MAIN "

this is where we reset the the firewall connection to cut the link between the 2 servers,
you can clearly see before this it has the same sort of error as later.

you can see it progresses quite well on the sync on first connection , but then fails 43 minutes later on the first new link

Is there any progress on this ?

it seems if we do this:

then after the connection errors it is reformed.

DOES NOT work if used here

Any progress on this yet?

Wouldn’t it be better to check the firewall/environment at this point?

The situation, as it is currently in your environment, seem to not have happend in other customer environments and we cannot reproduce it in our testingenvironment. You/we tried to change the code of the connector itself, to no avail. Shouldn’t the root cause be eliminated? Why is the connection getting suspended or silently breaks? The network cards in your servers do not power down or the likes - so can you reach all the servers (UCS and Windows) via other means if the situation appears?

I am sorry, but I think, that tinkering with the connector-code is not really helpful, since updates may break a “fix” and I do not see the problem at the connector itself.

One of the first things i did was check the firewall, vm and other setup items.
nothing goes down/up/sidways/backwards or other wise.
one thing is clear,

When ALL (2-3) connections between the two servers are broken

the server re-forms the connection & synchronises first time.

at that time there are THREE connections (firewall shows:)
1 172.18.0.47 192.168.0.16 389 TCP X3 LDAP
2 172.18.0.47 192.168.0.16 389 UDP X3 LDAP
3 172.18.0.47 192.168.0.16 49157 TCP N/A

Then at some point only 1 connection remains (the error condition), it is at this point the connection for sync breaks and the program errors.(as seen in my posts) (firewall shows:)
1 172.18.0.47 192.168.0.16 389 TCP LDAP

If i go into the firewall and MANUALLY cut this 389, then the whole thing starts working again and returns back to the 3 connections.

Therefore if at the point of error, the program cut & automatically reformed the connections it would all function.
Clearly something else is going on, since if any of the conditions you stated triggered, then also the connection 1 would be broken & it would all function when the condition was over.

But clearly 1 connection is being maintained.
your argument about other users is moot, since that is what a bug is. a condition that does not always manifest itself in other environments.

my point is this, if at the point of a program error related to connections, all connections were forcefully broken & the connections reformed (which the program clearly appears to be able to do)
this problem would disappear.
Perhaps my network has a condition that allows specific ports to be interrupted, but your program only checks the 389 LDAP connection flag and if it is there does not check other connections for defects, then potentially this bug is perfectly valid, since it is an unmapped condition, that you may or may not have foreseen.

Also any changes i made to the code, DO NOT appear to force the 389 connection to be cut & reformed, it just sits about. but if I do it from the linux shell or firewall it works first time

I have re-checked and also stripped out any firewall rules to blacklist on floods or non invited ports

BUT still cannot get this to reliably run for extended time on PW exchange.

The LDAP is ROCK solid, it will run for weeks without any sort of problem, even during disconnect & reconnect situations or hard power downs.

in latest version with all the updates, still getting:

  File "/usr/lib/pymodules/python2.7/univention/connector/__init__.py", line 1326, in sync_to_ucs
    f(self, property_type, object)
  File "/usr/lib/pymodules/python2.7/univention/connector/ad/password.py", line 381, in password_sync
    res = get_password_from_ad(connector, univention.connector.ad.compatible_modstring(object['dn']))
  File "/usr/lib/pymodules/python2.7/univention/connector/ad/password.py", line 180, in get_password_from_ad
    (level, ctr) = connector.drs.DsGetNCChanges(connector.drsuapi_handle, 8, req8)
NTSTATUSError: (-1073741300, 'The transport connection is now disconnected.')

could it be some sort of NAT issue:
VM 192.16.100.X->172.18.0.47->AD (192.168.0.16)

it is only port 49155 that is unable to maintain the connection, the other LDAP ports are fine

Does the attached patch help? password_reconnect.patch.txt (1.2 KB)

You can install it by running the following command:

patch -p0 -d /usr/share/pyshared/univention/connector/ad/ <password_reconnect.patch.txt 
service univention-ad-connector restart

I haven’t tested the patch, so I would recommend a test system.

Thanks,
Stefan

This fixed the problem right up.

now when it drops, the connection is re-formed correctly.
after several months of use it is not causing any other problems

Thanks for the info.

I’ve created a bug report for it: https://forge.univention.org/bugzilla/show_bug.cgi?id=45127

Mastodon