4.2-1 DC Master slapd library segfault libxmlsec1.so.1.2.20

openldap
ucs-4-2

#1

Another issue I’ve hit since an upgrade to 4.2-1 is a periodic segfault in slapd. I haven’t yet been able to pin it down to a reproducible cause for you.

But its often triggered seemingly randomly if I click around in the UMC interfaces of other servers (backup/member). Doesn’t seem to happen as often if I’m in the UMC of the master (which is most of the time).

Grepped syslog:

Jul  8 20:47:46 dcm1 slapd[5946]: Starting ldap server(s): slapd ...done.
Jul  8 20:47:46 dcm1 slapd[5946]: Checking Schema ID: ...done.
Jul  8 20:48:43 dcm1 kernel: [71889.842511] slapd[5969]: segfault at f ip 00007f8081c07c72 sp 00007f7ffee63210 error 4 in libxmlsec1.so.1.2.20[7f8081bcb000+5d000]
Jul  8 20:48:43 dcm1 systemd[1]: slapd.service: main process exited, code=killed, status=11/SEGV
Jul  8 20:48:44 dcm1 logger: /etc/init.d/slapd stop (pid: 6093, ppid:    1 systemd)
Jul  8 20:48:44 dcm1 slapd[6093]: Stopping ldap server(s): slapd ...start-stop-daemon: warning: failed to kill 5962: No such process
Jul  8 20:48:44 dcm1 slapd[6093]: done.
Jul  8 20:48:44 dcm1 systemd[1]: Unit slapd.service entered failed state.

In the systemctl status window it states slapd loads metadata from “/usr/share/univention-management-console/saml/idp/ucs-sso.ourdomain snipped.com.au.xml”. Given the log says a libxmlsec.so is what segfaults and the issues I’ve had with SSO SAML in another thread, could these issues be wrapped up together?

Startup

● slapd.service - LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol)
   Loaded: loaded (/etc/init.d/slapd)
   Active: active (running) since Sat 2017-07-08 20:49:20 AEST; 8min ago
  Process: 6093 ExecStop=/etc/init.d/slapd stop (code=exited, status=0/SUCCESS)
  Process: 6171 ExecStart=/etc/init.d/slapd start (code=exited, status=0/SUCCESS)
 Main PID: 6183 (slapd)
   CGroup: /system.slice/slapd.service
           └─6183 /usr/sbin/slapd -h ldapi:/// ldap://:7389/ ldaps://:7636/

Jul 08 20:49:20 dcm1 slapd[6182]: @(#) $OpenLDAP: slapd  (Jun 20 2017 17:36:33) $
                                          pbuser@ladda:/var/build/temp/tmp.9FUzllUqUa/pbuilder/openldap-2.4.42+dfsg/debian/build/servers/slapd
Jul 08 20:49:20 dcm1 slapd[6182]: Loaded metadata from "/usr/share/univention-management-console/saml/idp/ucs-sso.<our domain snipped>.com.au.xml"
Jul 08 20:49:20 dcm1 slapd[6171]: Starting ldap server(s): slapd ...done.
Jul 08 20:49:20 dcm1 slapd[6171]: Checking Schema ID: ...done.
Jul 08 20:49:20 dcm1 systemd[1]: Started LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol).

Failed:

● slapd.service - LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol)
   Loaded: loaded (/etc/init.d/slapd)
   Active: failed (Result: signal) since Sat 2017-07-08 20:47:17 AEST; 23s ago
  Process: 5894 ExecStop=/etc/init.d/slapd stop (code=exited, status=0/SUCCESS)
  Process: 4414 ExecStart=/etc/init.d/slapd start (code=exited, status=0/SUCCESS)
 Main PID: 4426 (code=killed, signal=SEGV)

Jul 08 20:37:21 dcm1 slapd[4425]: @(#) $OpenLDAP: slapd  (Jun 20 2017 17:36:33) $
                                          pbuser@ladda:/var/build/temp/tmp.9FUzllUqUa/pbuilder/openldap-2.4.42+dfsg/debian/build/servers/slapd
Jul 08 20:37:21 dcm1 slapd[4425]: Loaded metadata from "/usr/share/univention-management-console/saml/idp/ucs-sso.<our domain snipped>.au.xml"
Jul 08 20:37:21 dcm1 slapd[4414]: Starting ldap server(s): slapd ...done.
Jul 08 20:37:22 dcm1 slapd[4414]: Checking Schema ID: ...done.
Jul 08 20:37:22 dcm1 systemd[1]: Started LSB: OpenLDAP standalone server (Lightweight Directory Access Protocol).
Jul 08 20:47:16 dcm1 systemd[1]: slapd.service: main process exited, code=killed, status=11/SEGV
Jul 08 20:47:17 dcm1 slapd[5894]: Stopping ldap server(s): slapd ...start-stop-daemon: warning: failed to kill 4426: No such process
Jul 08 20:47:17 dcm1 slapd[5894]: done.
Jul 08 20:47:17 dcm1 systemd[1]: Unit slapd.service entered failed state.


#2

I seem to be able to pretty reliably trigger the segfault when logging into a joined backup or member server with SAML SSO and trying to load any module that hits the LDAP on the master (DNS/DHCP/LDAP).

If I log in without SSO (link on login page) so far it hasn’t crashed the master slapd. Some sort of issue parsing XML coming from SAML?


#3

Hello pp303,
Thank you for the report. I created a bug report for this segfault: https://forge.univention.org/bugzilla/show_bug.cgi?id=45042.
I assume you are running on amd64?

Could you create a core dump or a backtrace with gdb for us? (https://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault)

We have a web form to upload files: https://upload.univention.de/
Alternatively an encrypted email can be send to us as well.


#4

Thanks for the reply Best.

Turned on cores using the UCS SDB page and this proxmox guide for jessie (for systemd changes) but since upgrading to erratta 99 I’m having trouble getting past the SSO invalid signature error from the upgrade niggles topic.

What has been happening until now is if I try and login with SAML SSO on another DC, I get the invalid signature error posted about in other threads. However after a reboot of the DC Master I get one working SSO login (that causes the segfault) and after that SSO no longer works even after restarting UMC, slapd or trying stunnel4 etc

(although since last upgrades I don’t seem to get that far reliably at the moment).

After several reboots and fiddling around trying repeatedly to log in with SAML I managed to trigger the segfault.

Core uploaded as upload_oGStDb.unknown

After grepping segfault in the various logs I can see libxmlsec1.so.1.2.20 also shows up in a univention-management-console segfault that happens after a line regarding a SSO SAML assertion and a load of the sso xml.

syslog.1:Jul 25 08:26:48 dcm1 python2.7: Loaded metadata from "/usr/share/univention-management-console/saml/idp/ucs-sso.ourdomain-snipped.com.au.xml"
syslog.1:Jul 25 08:26:48 dcm1 python2.7: SAML assertion issuer is https://ucs-sso.ourdomain-snipped.com.au/simplesamlphp/saml2/idp/metadata.php
syslog.1:Jul 25 08:26:48 dcm1 kernel: [161928.412566] univention-mana[31896]: segfault at f ip 00007f27d2611c72 sp 00007f27d06f1690 error 4 in libxmlsec1.so.1.2.20[7f27d25d5000+5d000]

Hope that helps.


#5

Hi, I had a look at the bug report, and you mentioned that perhaps the SSL is expired?

I’m working through the other issues with SAML SSO with Moritz where he suspects something is wrong with my SSL (causing issues with stunnel4 on boot).

So far he’s asked me to look at the CA certs which seem to match on both DCs and don’t expire for a few years yet. They were generated when I was on 4.1-2 and given the number of issues I’ve had moving to 4.2 maybe something went wrong during the upgrade.

Cheers.


#6

Thank you :slight_smile:
I created a little patch which will prevent the crash. I think I will suggest the patch upstream soon.


#7

We are preparing an errata update for this. Thank you for reporting!


#8

Just upgraded to errata 122 but guessing this didn’t make it into that batch? Still getting the segfaults.


#9

No, it’s not released yet. I think it will be released next Wednesday.


#10

It has been released: http://errata.software-univention.de/ucs/4.2/124.html


#11

Thanks will apply today