NTP CRITICAL: Offset unknown in Nagios

nagios

#1

We have UCS Nagios installed, we have all alarms cleared due to Nagios configuration errors except one…this one show in the Nagios GUI:

NTP CRITICAL: Offset unknown

It shows for ALL our UCS Servers (Master/Backup and Members)

When we run NRPE manually from the command line as in:
/usr/lib/nagios/plugins/check_nrpe -H bebucsmdcsvrp2 -c UNIVENTION_NTP

The output is:
NRPE: Command ‘UNIVENTION_NTP’ not defined

Google has nothing about these…so any suggestions on where to look and where to correct this error?


#2

I’d first check if the computer objects in question have the UNIVENTION-NTP Nagios service in “advanced settings” defined.


#3

When thinking about my answer I realised that you obviously have the service definition in place. Otherwise you would not get this error…
UNIVENTION_NTP is not a command, can you try /usr/lib/nagios/plugins/check_ntp_time -H bebucsmdcsvrp2 ?


#4

I get the following as output from /usr/lib/nagios/plugins/check_ntp_time -H bebucsmdcsvrp2

NTP CRITICAL: Offset unknown|

When I run /usr/lib/nagios/plugins/check_nrpe -H bebucsmdcsvrp2 -c UNIVENTION_NTP the output is

NRPE: Command ‘UNIVENTION_NTP’ not defined

Googling either output, shows NOTHING…so I am not sure where to even start to correct this.


#5

Just searching for “check_ntp_time Offset unknown” I got results like https://serverfault.com/questions/625027/nagios-check-ntp-time-offset-unknown.
It might be worth to add verbosity (-v or -vv) to the check_ntp_time command to get an idea.


#6

It appears that there is REAL NO fix under UCS. As the suggestion only works until you apply something in UCR then it gets reset and the errors come back. Apparently the NTP config files are rebuilt with any edit of Univention Registry regardless of setting changes.

Editing discarding peer 0: stratum=0 TO discarding peer 0: stratum=(anything other than 0) works, until the UCR is edited for something and then it comes back.

I am thinking this might be a bug in UCS.

I am looking for a more permanent fix, but I am thinking just nix the whole NTP check might be the way to go.


#7

Without further proof this looks to me as you are trying to fix the symptoms rather than the source.
Did you check that your server is definitely not identifying itself as stratum 0 which would mean that the problem is not on the (NTP-) client side?


#8

Well our Primary Time Server is our UCS Master itself.

The Secondary Time Server is our UCS Backup.

Both provide time for all UCS Slaves and Member Servers, as well as Windows and other Linux Servers.

The UCS Domain Contoller Servers are set to what ever is the UCS NTP Default we never altered it after installation, other than setting the location/locale that is the only change that could possibly have NTP impacts.

We also have 2 Pfsense firewalls that provide NTP to our switches ONLY. Those pull off public NTP servers on the internet.


#9

Update According to “ntpq -p” ALL my UCS servers are identifying as stratum 16.

root@bebucsmdcsvrp2:~# ntpq -p
remote refid st t when poll reach delay offset jitter

ec2-52-34-232-2 .INIT. 16 u 10 64 0 0.000 0.000 0.000
bebucsmdcsvrp2. .INIT. 16 u - 64 0 0.000 0.000 0.000
ec2-52-41-14-20 .INIT. 16 u 8 64 0 0.000 0.000 0.000
bebucsbdcsvrp2. .INIT. 16 u 7 64 0 0.000 0.000 0.000

Not sure if this provides any insight…


#10

As far as I know there is no default NTP source defined by default.
The ntpq output may indicate that the current setup does not have a proper configuration.

This is from a lab machine after configuring an external NTP-source.

root@ucs-5084:~# ucr search timeserver
timeserver2: <empty>
 Usually the master domain controller functions as the time server of a UCS domain. With the variables 'timeserver', 'timeserver2' and 'timeserver3' external NTP servers can be included as time sources.

timeserver3: <empty>
 Usually the master domain controller functions as the time server of a UCS domain. With the variables 'timeserver', 'timeserver2' and 'timeserver3' external NTP servers can be included as time sources.

timeserver: ptbtime1.ptb.de
 Usually the master domain controller functions as the time server of a UCS domain. With the variables 'timeserver', 'timeserver2' and 'timeserver3' external NTP servers can be included as time sources.

root@ucs-5084:~# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ptbtime1.ptb.de .PTB.            1 u   19   64  377   15.264  1608.06   6.690
 LOCAL(0)        .LOCL.           5 l  283   64  360    0.000    0.000   0.000


#11

When I run “ucr search timeserver” on my Domain Master, I get:

root@bebucsmdcsvrp2:~# ucr search timeserver
timeserver2: 1.us.pool.ntp.org
Usually the master domain controller functions as the time server of a UCS domain. With the variables ‘timeserver’, ‘timeserver2’ and ‘timeserver3’ external NTP servers can be included as time sources.

timeserver3: 2.us.pool.ntp.org
Usually the master domain controller functions as the time server of a UCS domain. With the variables ‘timeserver’, ‘timeserver2’ and ‘timeserver3’ external NTP servers can be included as time sources.

timeserver: 0.us.pool.ntp.org
Usually the master domain controller functions as the time server of a UCS domain. With the variables ‘timeserver’, ‘timeserver2’ and ‘timeserver3’ external NTP servers can be included as time sources.

However when I run “timedatectl status” I get this:
root@bebucsmdcsvrp2:~# timedatectl status
Local time: Tue 2017-08-08 10:23:39 PDT
Universal time: Tue 2017-08-08 17:23:39 UTC
RTC time: Tue 2017-08-08 17:23:39
Time zone: America/Los_Angeles (PDT, -0700)
NTP enabled: no
NTP synchronized: no
RTC in local TZ: no
DST active: yes
Last DST change: DST began at
Sun 2017-03-12 01:59:59 PST
Sun 2017-03-12 03:00:00 PDT
Next DST change: DST ends (the clock jumps one hour backwards) at
Sun 2017-11-05 01:59:59 PDT
Sun 2017-11-05 01:00:00 PST

It appears that NTP is not running on my master or any other domain controller (backup/slave) as the results of “timedatectl status” are the same on all UCS Domain controllers.

It appears that I need to enable NTP. I’d like some confirmation on this before I go ahead and enable it with:
“timedatectl set-ntp true”


#12

Interesting. I must admitz that I never had to use timedatectl until now. On my reference machine I can see:

root@ucs-5084:~# timedatectl status  | grep NTP
     NTP enabled: no
NTP synchronized: yes

This seems to be independent from the running status of ntpd itself.
After switching “set-ntp” a couple of times I once had the situation that “NTP synchronized” was showing “no” for some time. But it switched to “yes” after stopping ntpd, running ntpdate manually and starting ntpd again.


#13

Ok, now I am more confused. It also appears that my time regardless is pretty accurate. So just wondering if I should just disable this Nagios Monitor.


#14

The time may be correct, however:
"The upper limit for stratum is 15; stratum 16 is used to indicate that a device is unsynchronized. "
(from https://en.wikipedia.org/wiki/Network_Time_Protocol)