System-Fehlerdiagnose: KDC Erreichbarkeit

Andreas_T · February 21, 2018, 3:49pm

(I posted this here before, but was asked to move it to an own topic, because of the different underlying conditions.)

We have a similar problem. Sorry if this is the wrong place for our problem, but maybe the problems are related. Our domain consists of four UCS systems:

ucs-master
ucs-slave
ucs-backup
ucs-ext

All of them are showing

Die folgenden KDCs waren nicht erreichbar: tcp ucs-*.foo.bar:88

So ucs-master show ucs-master.foo.bar:88, ucs-slave ucs-slave.foo.bar:88 and so on.

host -t srv _kerberos._tcp.$(ucr get domainname) shows all of them on all systems.

kinit --password-file=/etc/machine.secret $(hostname)\$@$(ucr get kerberos/realm) seems to work on all systems, at least klist shows issued tickets afterwards.

echo -ne "server $(ucr get ldap/master)\nprereq yxdomain $(hostname -f)\nsend\n" | nsupdate -d -g -t15 also looks like it works correctly.

When I run tcpdump -i eth0 tcp port 88 and run system diagnostics again, it shows connections to all systems except the one where I’m running it, for which it reports KDC unreachable. When I run tcpdump -i lo tcp port 88 it shows connections to localhost. Any idea?

Edit: I just solved my problem by adding ucs-slave ucs-slave.foo.bar to the /etc/hosts using:
ucr set hosts/static/127.0.0.1="localhost ucs-slave ucs-slave.foo.bar"

My /etc/hosts before the change:

127.0.0.1       localhost

192.168.1.2  ucs-slave.foo.bar ucs-slave

127.0.1.1       ucs-slave.foo.bar ucs-slave

::1             localhost ip6-localhost ip6-loopback
fe00::0         ip6-localnet
ff00::0         ip6-mcastprefix
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters

I don’t fully understand why it works now, though.

Moritz_Bunkus · February 21, 2018, 4:01pm

Hey,

even though you’ve found a workaround, can you please show the output of lsof -PniTCP:88 from both ucs-slave and ucs-master?

Kind regards,
mosu

Andreas_T · February 21, 2018, 4:05pm

ucs-master:

samba   23392 root   24u  IPv6 2290188      0t0  TCP [::1]:88 (LISTEN)
samba   23392 root   34u  IPv4 2290192      0t0  TCP 127.0.0.1:88 (LISTEN)
samba   23392 root   38u  IPv4 2290196      0t0  TCP 192.168.1.1:88 (LISTEN)

ucs-slave:

samba   1263 root   24u  IPv6  18348      0t0  TCP [::1]:88 (LISTEN)
samba   1263 root   34u  IPv4  18352      0t0  TCP 127.0.0.1:88 (LISTEN)
samba   1263 root   38u  IPv4  18356      0t0  TCP 192.168.1.2:88 (LISTEN)

Moritz_Bunkus · February 21, 2018, 6:24pm

Hey,

This seems wrong to me. I know other systems such as Ubuntu or Debian tend to set up 127.0.1.1 with the desired host name, but on UCS systems this is usually not the case. I’ve just verified with three different systems (one a newly set-up 4.2, the others long-lived systems originally set up in the 3.0 days and upgrade since), and neither of them contains such an entry.

What UCR variables have you set for static entries (see ucr search --brief hosts/static on both systems)?

Kind regards,
mosu

Andreas_T · February 21, 2018, 6:39pm

ucs-master:~# ucr search --brief hosts/static
hosts/static/.*: <empty>
hosts/static/127.0.0.1: localhost ucs-master ucs-master.foo.bar
hosts/static/127.0.1.1: ucs-master.foo.bar ucs-master

ucs-slave:~# ucr search --brief hosts/static
hosts/static/.*: <empty>
hosts/static/127.0.0.1: localhost ucs-slave ucs-slave.foo.bar
hosts/static/127.0.1.1: ucs-slave.foo.bar ucs-slave

I don’t remember setting 127.0.1.1, but I have two other UCS systems not affected by the problem, that have hosts/static/127.0.1.1 set.

Moritz_Bunkus · February 21, 2018, 7:50pm

Hey,

I still suggest you remove both entries (hosts/static/127.0.0.1 and hosts/static/127.0.1.1), reboot the server and run the system diagnostics again. One of the reasons is that having DNS consistent across the whole UCS domain is an important basis for it working well. That includes that host names resolve identically no matter on which host you query it on (e.g. ucs-slave.foo.bar should always resolve to 192.168.1.2, no matter if you’re currently on ucs-master or on ucs-slave itself) and which method is used (contacting a DNS server directly or using glibc's stub resolver via functions such as gethostbyname).

Kind regards,
mosu

Andreas_T · February 21, 2018, 9:01pm

Thanks, I removed both entries on all systems and rebooted them. System diagnostics still does not report any problem.

On the other domain there is an ucs-master and an ucs-backup.

root@ucs-master:~# cat /etc/hosts
...
127.0.0.1       localhost
192.168.100.1   ucs-master.bar.foo ucs-master
127.0.0.1       localhost localhost_sync
127.0.1.1       ucs-master.bar.foo ucs-master
...

localhost_sync seems to have something to do with Open XChange according to Google. Should I also remove both entries here?

Moritz_Bunkus · February 22, 2018, 8:22am

Hey,

why is there still an entry for 127.0.1.1 on your ucs-master if the entries have UCR variables have really been unset? That doesn’t really make sense… Please post the output of grep -Fr 127.0.1.1 /etc; it should give us some clues.

I need to mention that my OX test installation does contain static host entries for 127.0.1.1, too. I haven’t found out where and when they’re set, though; neither the package installation files in /var/lib/dpkg/info nor the domain join scripts in /usr/lib/univention-install do anything with 127.0.1.1… Very strange.

What’s even stranger is that I can reproduce the “KDC not reachable” issue on that machine as well. Removing the 127.0.1.1 entry doesn’t solve the issue for me. I’ll have to look into this some more.

Kind regards,
mosu

Andreas_T · February 22, 2018, 8:43am

Hi mosu,

why is there still an entry for 127.0.1.1 on your ucs-master if the entries have UCR variables have really been unset? That doesn’t really make sense…

My previous post was about a totally different domain. Sorry for not making it clear enough.

I do not have the “KDC not reachable” issue on this other domain.

Edit: Suddenly I have the same problem in the other domain…

Moritz_Bunkus · February 23, 2018, 1:31pm

Maybe this is the same false positive/bug in the check that Stefan Gomann mentioned in this post:

KDC check fails due to “Record Mark” option in network package