Today after upgrading our UCS Installation to 5.0-2 we started getting alerts from cron every 5 minutes.
One issue was manually solved through mkdir -p /var/lib/prometheus/node-exporter.
However we still get stacktraces from check_univention_s4_connector:
raceback (most recent call last):
File "/usr/share/univention-monitoring-client/scripts//check_univention_s4_connector", line 75, in <module>
S4Connector.main()
File "/usr/lib/python3/dist-packages/univention/monitoring/__init__.py", line 74, in main
self.write_metrics()
File "/usr/share/univention-monitoring-client/scripts//check_univention_s4_connector", line 71, in write_metrics
self.debug('Found %d reject(s)! Please check output of univention-s4connector-list-rejected.' % (rejects,))
AttributeError: 'S4Connector' object has no attribute 'debug'
run-parts: /usr/share/univention-monitoring-client/scripts//check_univention_s4_connector exited with return code 1
Traceback (most recent call last):
File "/usr/share/univention-monitoring-client/scripts//check_univention_samba_drs_failures", line 86, in <module>
CheckSambaDrsRepl.main()
File "/usr/lib/python3/dist-packages/univention/monitoring/__init__.py", line 74, in main
self.write_metrics()
File "/usr/share/univention-monitoring-client/scripts//check_univention_samba_drs_failures", line 65, in write_metrics
(info_type, info) = drsuapi.DsReplicaGetInfo(self.drsuapi_handle, 1, req1)
TypeError: cannot unpack non-iterable drsuapi.DsReplicaGetInfo object
run-parts: /usr/share/univention-monitoring-client/scripts//check_univention_samba_drs_failures exited with return code 1
Since I’m not quite sure if this is an issue resulting from a particular (mis)configuration on our side, or a general issue - I’m opening this thread (although at least the error-message could be a bit more helpful in finding the misconfiguration).
In order to stop the annoying mails I commented part of the file “/usr/share/univention-monitoring-client/scripts/check_univention_samba_drs_failures” starting with line 65 out:
drsuapi_connect(self)
req1 = drsuapi.DsReplicaGetInfoRequest1()
# req1.info_type = drsuapi.DRSUAPI_DS_REPLICA_INFO_REPSTO
# (info_type, info) = drsuapi.DsReplicaGetInfo(self.drsuapi_handle, 1, req1)
# for n in info.array:
# if n.consecutive_sync_failures > 0:
# (site, server) = drs_parse_ntds_dn(n.source_dsa_obj_dn)
# consecutive_sync_failures.setdefault(server, 0)
# consecutive_sync_failures[server] += n.consecutive_sync_failures
except (CommandError, RuntimeError) as exc:
self.write_metric('univention_samba_drs_failures', -1)
self.log.debug(str(exc))
return
This is probably not the desired solution. Any thoughts ?
And on my member/managed node I still get 5min-emails:
Traceback (most recent call last):
File "/usr/share/univention-monitoring-client/scripts//check_univention_ldap", line 54, in <module>
LDAP.main()
File "/usr/lib/python3/dist-packages/univention/monitoring/__init__.py", line 74, in main
self.write_metrics()
File "/usr/share/univention-monitoring-client/scripts//check_univention_ldap", line 40, in write_metrics
slapd_port = ucr['slapd/port'].split(',')[0]
AttributeError: 'NoneType' object has no attribute 'split'
run-parts: /usr/share/univention-monitoring-client/scripts//check_univention_ldap exited with return code 1
mdb_env_open failed, error 2 No such file or directory
@Best I have just upgraded to 5.0.2 and am seeing the same behavior as mschlee:
(3221356597, 'The operation cannot be performed.')
Traceback (most recent call last):
File "/usr/share/univention-monitoring-client/scripts//check_univention_samba_drs_failures", line 78, in write_metrics
consecutive_sync_failures = _CheckSambaDrsRepl().check()
File "/usr/share/univention-monitoring-client/scripts//check_univention_samba_drs_failures", line 59, in check
(info_type, info) = self.drsuapi.DsReplicaGetInfo(self.drsuapi_handle, 1, req1)
samba.NTSTATUSError: (3221356597, 'The operation cannot be performed.')
These emails come every five minutes.
The temporary fixed I used was slightly different as is looks like the update changed check_univention_samba_drs_failures
I commented out the try/except block beginning on line 77:
class CheckSambaDrsRepl(Alert):
def write_metrics(self):
# return OK, if samba autostart is false
if not ucr.is_true('samba4/autostart', False):
self.write_metric('univention_samba_drs_failures', 0)
self.log.debug('samba4/autostart is not true')
return
try:
consecutive_sync_failures = _CheckSambaDrsRepl().check()
except (CommandError, RuntimeError) as exc:
self.write_metric('univention_samba_drs_failures', -1)
# self.log.exception(str(exc))
return
msg = None
for server, failures in consecutive_sync_failures.items():
text = '%s failures on %s' % (failures, server)
msg = msg + ', ' + text if msg else text
self.write_metric('univention_samba_drs_failures', sum(consecutive_sync_failures.values()))
self.log.debug(msg or 'no drs failures')
As mschlee said,
This is probably not the desired solution. Any thoughts ?
your solution is acutally correct - if we ignore that error we should not log it to stderr. Otherwise cron will sent regular emails.
The reason why this exception is happening in your environment is unclear to me. This should not be the case.
Well, the check was there for a reason, and if you comment out the core of it, sure that helps in silencing it. I guess this The operation cannot be performed. error during the DCERPC call indicated that things like samba-tool drs showrepl also did not work in the affected environments. I could imagine that this was fallout from Bug 55486 / Bug 55595, because that caused samba-dcerpcd having issues.