UCS 4.3-3 Can't install Nagios

Hello

I have a PDC (dc1.local.intranet IP:10.0.0.1) and a Memberserver (srv1.local.intranet IP:10.0.0.5).
I have Nagios App installed on Memberserver.

Unfortunetly I cant install Nagios on dc1 or srv1. Instalation process says that all is OK but nagios service doesn’t start.

Please help to debug and fix.

Thanks in advance.

Hi,

on the master server:

dpkg -l| grep nagios
ps ax | grep nrpe

Is the server enabled (through UMC) for Nagios?

/CV

Problems on srv1:

Something went much worse…

On srv1 I tried to clear nagios files so I changed name of nagios folder to nagios_TMP, then I uninstalled nagios and then install it again. I thought it will create all files again and if problem was with those files it will fix something. But no. Now on srv1 nagios-nrpe-server service is stopped and cant be started. Tried to backup nagios folder and tried to install nagios again but it ends with error.
When I try to start service manually:
sudo service nagios-nrpe-server status
i get that info:

Administrator@srv1:~$ sudo journalctl -xe`

sty 10 09:35:24 srv1 nrpe[12959]: Unable to open config file ‘/etc/nagios/nrpe.cfg’ for reading
sty 10 09:35:24 srv1 nrpe[12959]: Config file ‘/etc/nagios/nrpe.cfg’ contained errors, aborting…
sty 10 09:35:24 srv1 systemd[1]: nagios-nrpe-server.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
sty 10 09:35:24 srv1 systemd[1]: nagios-nrpe-server.service: Unit entered failed state.
sty 10 09:35:24 srv1 systemd[1]: nagios-nrpe-server.service: Failed with result ‘exit-code’.

EDIT : Error above is fixed.
I checked files and permissions to them. Service nagios-nrpe-server seems to work properly:

Administrator@srv1:~$ sudo service nagios-nrpe-server status
● nagios-nrpe-server.service - Nagios Remote Plugin Executor
Loaded: loaded (/lib/systemd/system/nagios-nrpe-server.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2019-01-10 13:24:17 CET; 10s ago
Docs: http://www.nagios.org/documentation
Process: 27427 ExecStopPost=/bin/rm -f /var/run/nagios/nrpe.pid (code=exited, status=0/SUCCESS)
Main PID: 27432 (nrpe)
Tasks: 1 (limit: 4915)
Memory: 660.0K
CPU: 7ms
CGroup: /system.slice/nagios-nrpe-server.service
└─27432 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f

sty 10 13:24:17 srv1 systemd[1]: Started Nagios Remote Plugin Executor.
sty 10 13:24:17 srv1 nrpe[27432]: Starting up daemon
sty 10 13:24:17 srv1 nrpe[27432]: Server listening on 0.0.0.0 port 5666.
sty 10 13:24:17 srv1 nrpe[27432]: Server listening on :: port 5666.
sty 10 13:24:17 srv1 nrpe[27432]: Listening for connections on port 5666
sty 10 13:24:17 srv1 nrpe[27432]: Allowing connections from: dc1.local.intranet

Result of commands on my dc1:

Administrator@dc1:~$ dpkg -l | grep nagios
ii monitoring-plugins 2.2-3A~4.3.0.201711222223 all Plugins for nagios compatible monitoring systems (metapackage)
ii monitoring-plugins-basic 2.2-3A~4.3.0.201711222223 amd64 Plugins for nagios compatible monitoring systems (basic)
ii monitoring-plugins-common 2.2-3A~4.3.0.201711222223 amd64 Common files for plugins for nagios compatible monitoring
ii monitoring-plugins-standard 2.2-3A~4.3.0.201711222223 amd64 Plugins for nagios compatible monitoring systems (standard)
rc nagios-cgi 4.3.4-0A~4.3.0.201801311337 amd64 cgi files for nagios
rc nagios-common 4.3.4-0A~4.3.0.201801311337 all support files for nagios
rc nagios-nrpe-plugin 3.0.1-3+deb9u1A~4.3.0.201801231652 amd64 Nagios Remote Plugin Executor Plugin
ii nagios-nrpe-server 3.0.1-3+deb9u1A~4.3.0.201801231652 amd64 Nagios Remote Plugin Executor Server
ii univention-nagios-client 11.0.1-6A~4.3.0.201805160949 amd64 UCS: nagios client support
ii univention-nagios-common 11.0.1-6A~4.3.0.201805160949 amd64 UCS: nagios client support
rc univention-nagios-cups 7.0.0-3A~4.3.0.201808081424 all nagios plugin for monitoring cups daemon and webservice
rc univention-nagios-dansguardian 7.0.0-3A~4.3.0.201808081424 all nagios plugin for monitoring dansguardian daemon and webservice
ii univention-nagios-s4-connector 3.0.0-1A~4.3.0.201712120057 amd64 nagios plugin for UCS S4 connector
ii univention-nagios-samba 3.0.0-1A~4.3.0.201712120054 amd64 nagios plugin for UCS samba
rc univention-nagios-server 11.0.1-6A~4.3.0.201805160949 amd64 UCS: nagios server support
rc univention-nagios-squid 7.0.0-3A~4.3.0.201808081424 all nagios plugin for monitoring squid daemon and webservice

Administrator@dc1:~$ ps ax | grep nrpe
7030 ? Ss 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f
10377 pts/0 S+ 0:00 grep nrpe

Still trying… and fighting :wink:

Ok, there is some progress.

After fix services nagios-nrpe-server (post above) I tried install Nagios again. As expected there was an error with nagios service and service did not run:

Administrator@srv1:~$ sudo service nagios restart
Job for nagios.service failed because the control process exited with error code.
See “systemctl status nagios.service” and “journalctl -xe” for details.
Administrator@srv1:~$ sudo service nagios status
● nagios.service - LSB: nagios host/service/network monitoring and management system
Loaded: loaded (/etc/init.d/nagios; generated; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-01-11 08:28:56 CET; 6s ago
Docs: man:systemd-sysv-generator(8)
Process: 7499 ExecStop=/etc/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 8928 ExecStart=/etc/init.d/nagios start (code=exited, status=1/FAILURE)
CPU: 350ms

sty 11 08:28:56 srv1 nagios[8928]: version of Nagios, you should be aware that some variables/definitions
sty 11 08:28:56 srv1 nagios[8928]: may have been removed or modified in this version. Make sure to read
sty 11 08:28:56 srv1 nagios[8928]: the HTML documentation regarding the config files, as well as the
sty 11 08:28:56 srv1 nagios[8928]: ‘Whats New’ section to find out what has changed.
sty 11 08:28:56 srv1 nagios[8928]: errors in config! … failed!
sty 11 08:28:56 srv1 nagios[8928]: failed!
sty 11 08:28:56 srv1 systemd[1]: nagios.service: Control process exited, code=exited status=1
sty 11 08:28:56 srv1 systemd[1]: Failed to start LSB: nagios host/service/network monitoring and management system.
sty 11 08:28:56 srv1 systemd[1]: nagios.service: Unit entered failed state.
sty 11 08:28:56 srv1 systemd[1]: nagios.service: Failed with result ‘exit-code’.

I checked details:

Administrator@srv1:~$ sudo journalctl -xe
sty 11 08:28:56 srv1 systemd[1]: Starting LSB: nagios host/service/network monitoring and management system…
(…)
sty 11 08:28:56 srv1 nagios[8928]: Reading configuration data…
sty 11 08:28:56 srv1 nagios[8928]: Read main config file okay…
sty 11 08:28:56 srv1 nagios[8928]: Error: Could not open config directory ‘/etc/nagios/conf.univention.d/contacts’ for reading.
(…)

For some reason permissions were 700 instead of 755 for /etc/nagios.conf.univention/, so I fixed it manually.

I restarted nagios service and all semmed to be ok:

Administrator@srv1:~$ sudo service nagios status
● nagios.service - LSB: nagios host/service/network monitoring and management system
Loaded: loaded (/etc/init.d/nagios; generated; vendor preset: enabled)
Active: active (running) since Fri 2019-01-11 08:39:13 CET; 1s ago
Docs: man:systemd-sysv-generator(8)
Process: 9450 ExecStop=/etc/init.d/nagios stop (code=exited, status=0/SUCCESS)
Process: 9482 ExecStart=/etc/init.d/nagios start (code=exited, status=0/SUCCESS)
Tasks: 8 (limit: 4915)
Memory: 3.5M
CPU: 366ms
CGroup: /system.slice/nagios.service
├─9525 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
├─9526 /usr/sbin/nagios --worker /var/lib/nagios/rw/nagios.qh
├─9527 /usr/sbin/nagios --worker /var/lib/nagios/rw/nagios.qh
├─9528 /usr/sbin/nagios --worker /var/lib/nagios/rw/nagios.qh
├─9529 /usr/sbin/nagios --worker /var/lib/nagios/rw/nagios.qh
├─9530 /usr/sbin/nagios --worker /var/lib/nagios/rw/nagios.qh
├─9531 /usr/sbin/nagios --worker /var/lib/nagios/rw/nagios.qh
└─9532 /usr/sbin/nagios -d /etc/nagios/nagios.cfg

sty 11 08:39:13 srv1 nagios[9525]: WARNING: The retry_check_interval attribute is deprecated and will be removed in future versions. Please use retry_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The normal_check_interval attribute is deprecated and will be removed in future versions. Please use check_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The retry_check_interval attribute is deprecated and will be removed in future versions. Please use retry_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The normal_check_interval attribute is deprecated and will be removed in future versions. Please use check_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The retry_check_interval attribute is deprecated and will be removed in future versions. Please use retry_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The normal_check_interval attribute is deprecated and will be removed in future versions. Please use check_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The retry_check_interval attribute is deprecated and will be removed in future versions. Please use retry_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The normal_check_interval attribute is deprecated and will be removed in future versions. Please use check_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: WARNING: The retry_check_interval attribute is deprecated and will be removed in future versions. Please use retry_interval instead.
sty 11 08:39:13 srv1 nagios[9525]: Successfully launched command file worker with pid 9532

But it is not the end of problems…

Nagios is active and I can log into but there on only errors:

|[srv1.local.intranet
||[UNIVENTION_CUPS]|CRITICAL|2019-01-11 08:58:13|0d 0h 15m 24s|5/5|CHECK_NRPE: Error - Could not connect to 10.10.84.5: Connection reset by peer|
||[UNIVENTION_DISK_ROOT]|CRITICAL|2019-01-11 08:58:13|0d 0h 10m 24s|10/10|CHECK_NRPE: Error - Could not connect to 10.10.84.5: Connection reset by peer|
||[UNIVENTION_DNS]|CRITICAL|2019-01-11 08:49:41|0d 0h 18m 56s|10/10|CHECK_NRPE: Error - Could not connect to 10.10.84.5: Connection reset by peer|
||[UNIVENTION_JOINSTATUS]|CRITICAL|2019-01-11 08:59:13|0d 0h 0m 24s|1/1|CHECK_NRPE: Error - Could not connect to 10.10.84.5: Connection reset by peer|
||[UNIVENTION_LOAD]|CRITICAL|2019-01-11 08:55:25|0d 0h 19m 12s|1/1|CHECK_NRPE: Error - Could not connect to 10.10.84.5: Connection reset by peer|
||[UNIVENTION_NSCD2]|CRITICAL|2019-01-11 08:47:55|0d 0h 13m 42s|2/2|CHECK_NRPE: Error - Could not connect to 10.10.84.5: Connection reset by peer|
(…)

There is also an issue that there is dc1 with nagios support checked on UCM but in Nagios Core it is visible as host but there are not any services active on it.

Hi,

Are you sure the firewall rules are set correctly on your srv1? To allow nrpe connections? Usually they are set automatically, but who knows? And is nrpe really running?

root@srv1:~# ucr search --brief packetfilter
root@srv1:~# ps ax | grep nrpe

/CV

It looks like all is ok.

Administrator@srv1:~$ sudo ucr search --brief packetfilter
[sudo] hasło użytkownika Administrator:
security/packetfilter/defaultpolicy: REJECT
security/packetfilter/disabled:
security/packetfilter/docker/disabled:
security/packetfilter/package/.:
security/packetfilter/package/univention-apache/tcp/443/all/en: HTTPS
security/packetfilter/package/univention-apache/tcp/443/all: ACCEPT
security/packetfilter/package/univention-apache/tcp/80/all/en: HTTP
security/packetfilter/package/univention-apache/tcp/80/all: ACCEPT
security/packetfilter/package/univention-base-files/tcp/22/all/en: SSH
security/packetfilter/package/univention-base-files/tcp/22/all: ACCEPT
security/packetfilter/package/univention-base-files/udp/123/all/en: ntp
security/packetfilter/package/univention-base-files/udp/123/all: ACCEPT
security/packetfilter/package/univention-heimdal-common/tcp/544/all/en: krsh
security/packetfilter/package/univention-heimdal-common/tcp/544/all: ACCEPT
security/packetfilter/package/univention-management-console-server/tcp/6670/all/en: UMC
security/packetfilter/package/univention-management-console-server/tcp/6670/all: ACCEPT
security/packetfilter/package/univention-nagios-client/tcp/5666/all/en: Nagios NRPE
security/packetfilter/package/univention-nagios-client/tcp/5666/all: ACCEPT
security/packetfilter/package/univention-nfs/tcp/111/all/en: portmap
security/packetfilter/package/univention-nfs/tcp/111/all: ACCEPT
security/packetfilter/package/univention-nfs/tcp/2049/all/en: NFS
security/packetfilter/package/univention-nfs/tcp/2049/all: ACCEPT
security/packetfilter/package/univention-nfs/tcp/32765:32769/all/en: NFS related RPC daemons
security/packetfilter/package/univention-nfs/tcp/32765:32769/all: ACCEPT
security/packetfilter/package/univention-nfs/udp/111/all/en: portmap
security/packetfilter/package/univention-nfs/udp/111/all: ACCEPT
security/packetfilter/package/univention-nfs/udp/2049/all/en: NFS
security/packetfilter/package/univention-nfs/udp/2049/all: ACCEPT
security/packetfilter/package/univention-nfs/udp/32765:32769/all/en: NFS related RPC daemons
security/packetfilter/package/univention-nfs/udp/32765:32769/all: ACCEPT
security/packetfilter/package/univention-postgresql-9.6/tcp/5432/all/en: postgresql
security/packetfilter/package/univention-postgresql-9.6/tcp/5432/all: ACCEPT
security/packetfilter/package/univention-printserver/tcp/631/all/en: IPP
security/packetfilter/package/univention-printserver/tcp/631/all: ACCEPT
security/packetfilter/package/univention-printserver/udp/631/all/en: IPP
security/packetfilter/package/univention-printserver/udp/631/all: ACCEPT
security/packetfilter/package/univention-samba/tcp/137:139/all/en: netbios (Samba)
security/packetfilter/package/univention-samba/tcp/137:139/all: ACCEPT
security/packetfilter/package/univention-samba/tcp/445/all/en: microsoft-ds (Samba)
security/packetfilter/package/univention-samba/tcp/445/all: ACCEPT
security/packetfilter/package/univention-samba/udp/137/all: ACCEPT
security/packetfilter/package/univention-samba/udp/137:139/all/en: netbios (Samba)
security/packetfilter/package/univention-samba/udp/137:139/all: ACCEPT
security/packetfilter/package/univention-samba/udp/445/all/en: microsoft-ds (Samba)
security/packetfilter/package/univention-samba/udp/445/all: ACCEPT
security/packetfilter/tcp/.
:
security/packetfilter/udp/.*:
security/packetfilter/use_packages:

Administrator@srv1:~$ sudo ps ax | grep nrpe
[sudo] hasło użytkownika Administrator:
25091 pts/0 S+ 0:00 grep nrpe
27432 ? Ss 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f

I think, maybe it can be a problem that in my /etc/nagios/nrpe.cfg is:

allowed_hosts=dc1.local.intranet

where my Nagios is installed on srv1.local.intranet?

I changed in nrpe.cfg to allowed_hosts=srv1.local.intranet and Nagios started to work.
When I was thinking why i had those problems I remember that I deleted my Memberserver on PDC using Computers tab in UMC while there was a Nagios installed on it. After that installation processes doesn’t fix all and Nagios wasn’t work. I will remember that manuall delete Memberserver wasn’t a good idea.

EDIT:
After server restarts my line in nagios.cfg is

allowed_hosts=dc1.local.intranet

again and errors appears again.

Any way to fix it pernamently?

Kind Regards,
LK

Mastodon