CTDB HA Samba Filesever on UCS

sebastian_bbaw · August 5, 2021, 3:57pm

Dear Univention Community,

first of all i want to thank you for the work and effort you put into UCS! Its great!

I am planning to deploy an Active-Directory for a larger group of Windows users and id like to go with UCS.

At the moment i am having trouble to setup clustered fileservers in my Univention 5.0 test environment. I am trying to achieve high availabilty for my fileservers, so i decided to go with ctdb and ceph. I couldnt find any instructions how to set it up on a univention member server.

I tried to apply the following instructions to achive the HA-Filserver Cluster: ceph-heinlein-tools/ctdb at master · HeinleinSupport/ceph-heinlein-tools · GitHub . But i cant get it to work.

If there is someone who could point me in the right direction how to setup two samba filservers in a ctdb cluster on univention member servers, i would really appreciate it.

So far my Testlab consist of:

Primary Directory Node
Replication Directory Node
Fileserver1 (Memberserver) (Univention App “Windows comtatible Memberverser” installed)
Fileserver2 (Memberserver) (Univention App “Windows comtatible Memberverser” installed)

Both member server have 3 NICs:

enps03 - public Net (fs1: 192.168.100.20 - fs2: 192.168.100.21)
enps08 - ceph Net
enps09 - ctdb Net (fs1: 192.168.200.120 - fs2: 192.168.200.121)
And both are connected to my ceph test cluster and on both systems cephfs is mounted to /mnt/cephfs

Heres a more detailed explanation with i have tried so far: (on both fileservers)

Check if cluster support is available in smbd:

smbd -b | grep -i 'ctdb\|cluster'
Output:
   CLUSTER_SUPPORT
   CTDB_DATADIR
   CTDB_ETCDIR
   CTDB_HELPER_BINDIR
   CTDB_RUNDIR
   CTDB_SOCKET
   CTDB_VARDIR
Cluster support features:
   CLUSTER_SUPPORT
   CTDB_SOCKET: /run/ctdb/ctdbd.socket
   CTDB_PROTOCOL: 1

enable unmaintained repo:
ucr set repository/online/unmaintained=yes

install ctdb package:
apt install ctdb

content of my /etc/ctdb/ctdb.conf file:

[cluster]
	recovery lock = /mnt/cephfs/ctdb/ctdb.lock

content of my /etc/ctdb/public_addresses file:

192.168.100.22/24 enp0s3
192.168.100.23/24 enp0s3

content of my /etc/ctdb/nodes file:

192.168.200.120
192.168.200.121

content of my /etc/ctdb/script.options file:

CTDB_RPCINFO_LOCALHOST="127.0.0.1"
CTDB_SAMBA_SKIP_SHARE_CHECK=yes

I enabled two CTDB event scripts:

ctdb event script enable legacy 49.winbind
ctdb event script enable legacy 50.samba

To enable cluster support in samba (afaik) i have to add these lines to smb.conf:

	clustering = yes
	include = registry

Based on this article:

i tried to add the following UCR variables to add the needed configuration to my smb.conf:

ucr set samba/global/options/clustering=yes
ucr set samba/global/options/include=registry

Now, when i try to restart samba:

/etc/init.d/samba restart

The output is:

[ ok ] Stopping smbd (via systemctl): smbd.service.
[ ok ] Stopping nmbd (via systemctl): nmbd.service.
[ ok ] Starting nmbd (via systemctl): nmbd.service.
[....] Starting smbd (via systemctl): smbd.serviceJob for smbd.service failed because the control process exited with error code.

The samba service fails to start and i find this in my /var/log/samba/log.smbd:

 [2021/08/05 16:36:28.640242,  0] ../../source3/smbd/server.c:1784(main)
   smbd version 4.13.7-Univention started.
   Copyright Andrew Tridgell and the Samba Team 1992-2020
 [2021/08/05 16:36:28.641398,  0] ../../lib/util/fault.c:159(smb_panic_log)
   ===============================================================
 [2021/08/05 16:36:28.641407,  0] ../../lib/util/fault.c:163(smb_panic_log)
   INTERNAL ERROR: messaging not initialized
    in pid 16653 (4.13.7-Univention)
 [2021/08/05 16:36:28.641413,  0] ../../lib/util/fault.c:168(smb_panic_log)
   If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest ersions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting
 [2021/08/05 16:36:28.641418,  0] ../../lib/util/fault.c:169(smb_panic_log)
   ===============================================================
 [2021/08/05 16:36:28.641422,  0] ../../lib/util/fault.c:171(smb_panic_log)
   PANIC (pid 16653): messaging not initialized
    in 4.13.7-Univention
 [2021/08/05 16:36:28.641665,  0] ../../lib/util/fault.c:275(log_stack_trace)
   BACKTRACE: 28 stack frames:
    #0 /lib/x86_64-linux-gnu/libsamba-util.so.0(log_stack_trace+0x30) [0x7fe9bd5c8040]
    #1 /lib/x86_64-linux-gnu/libsamba-util.so.0(smb_panic+0x24) [0x7fe9bd5c82b4]
    #2 /lib/x86_64-linux-gnu/libsmbconf.so.0(+0x4ea30) [0x7fe9bd1fba30]
    #3 /lib/x86_64-linux-gnu/libsmbconf.so.0(db_open+0x20f) [0x7fe9bd1d043f]
    #4 /lib/x86_64-linux-gnu/libsmbconf.so.0(regdb_init+0xa9) [0x7fe9bd212149]
    #5 /lib/x86_64-linux-gnu/libsmbconf.so.0(registry_init_common+0x6) [0x7fe9bd215ae6]
    #6 /lib/x86_64-linux-gnu/libsmbconf.so.0(registry_init_smbconf+0x2a) [0x7fe9bd208cea]
    #7 /lib/x86_64-linux-gnu/libsmbconf.so.0(+0x5d4c1) [0x7fe9bd20a4c1]
    #8 /lib/x86_64-linux-gnu/libsmbconf.so.0(smbconf_init_internal+0x3e) [0x7fe9bd1f75fe]
    #9 /lib/x86_64-linux-gnu/libsmbconf.so.0(smbconf_init+0xb3) [0x7fe9bd209933]
    #10 /lib/x86_64-linux-gnu/libsmbconf.so.0(+0x29245) [0x7fe9bd1d6245]
    #11 /lib/x86_64-linux-gnu/libsmbconf.so.0(process_registry_service+0x39) [0x7fe9bd1df5e9]
    #12 /lib/x86_64-linux-gnu/libsmbconf.so.0(lp_include+0x5b) [0x7fe9bd1df7fb]
    #13 /lib/x86_64-linux-gnu/libsamba-hostconfig.so.0(lpcfg_do_global_parameter+0x91) [0x7fe9bcf5f0f1]
    #14 /lib/x86_64-linux-gnu/libsamba-util.so.0(tini_parse+0x30f) [0x7fe9bd5b9ccf]
    #15 /lib/x86_64-linux-gnu/libsamba-util.so.0(pm_process+0x3a) [0x7fe9bd5bc38a]
    #16 /lib/x86_64-linux-gnu/libsmbconf.so.0(lp_include+0x1b8) [0x7fe9bd1df958]
    #17 /lib/x86_64-linux-gnu/libsamba-util.so.0(tini_parse+0x30f) [0x7fe9bd5b9ccf]
    #18 /lib/x86_64-linux-gnu/libsamba-util.so.0(pm_process+0x3a) [0x7fe9bd5bc38a]
    #19 /lib/x86_64-linux-gnu/libsmbconf.so.0(lp_include+0x1b8) [0x7fe9bd1df958]
    #20 /lib/x86_64-linux-gnu/libsamba-util.so.0(tini_parse+0x30f) [0x7fe9bd5b9ccf]
    #21 /lib/x86_64-linux-gnu/libsamba-util.so.0(pm_process+0x3a) [0x7fe9bd5bc38a]
    #22 /lib/x86_64-linux-gnu/libsmbconf.so.0(+0x33fac) [0x7fe9bd1e0fac]
    #23 /lib/x86_64-linux-gnu/libsmbconf.so.0(lp_load_with_shares+0x20) [0x7fe9bd1e1970]
    #24 /usr/lib/x86_64-linux-gnu/samba/libsmbd-base.so.0(reload_services+0x140) [0x7fe9bd3c0140]
    #25 /usr/sbin/smbd(main+0x4c6) [0x55dcd90e93e6]
    #26 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fe9bcd0409b]
    #27 /usr/sbin/smbd(_start+0x2a) [0x55dcd90eae4a]
 [2021/08/05 16:36:28.641729,  0] ../../source3/lib/dumpcore.c:315(dump_core)
   dumping core in /var/log/samba/cores/smbd

When i try to start the ctdb service, the log in /var/log/ctdb/log.ctdb looks like this:

2021/08/05 16:40:26.998515 ctdbd[17710]: CTDB starting on node
2021/08/05 16:40:26.998634 ctdbd[17710]: Recovery lock not set
2021/08/05 16:40:26.999338 ctdbd[17711]: Starting CTDBD (Version 4.13.7-Univention) as PID: 17711
2021/08/05 16:40:27.000744 ctdbd[17711]: Created PID file /var/run/ctdb/ctdbd.pid
2021/08/05 16:40:27.000771 ctdbd[17711]: Removed stale socket /var/run/ctdb/ctdbd.socket
2021/08/05 16:40:27.000787 ctdbd[17711]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket
2021/08/05 16:40:27.000800 ctdbd[17711]: Set real-time scheduler priority
2021/08/05 16:40:27.000871 ctdbd[17711]: Starting event daemon /usr/lib/x86_64-linux-gnu/ctdb/ctdb-eventd -P 17711 -S 14
2021/08/05 16:40:27.006039 ctdbd[17711]: Set runstate to INIT (1)
2021/08/05 16:40:27.081825 ctdbd[17711]: PNN is 0
2021/08/05 16:40:27.081873 ctdbd[17711]: Loaded public addresses from /etc/ctdb/public_addresses
2021/08/05 16:40:27.083810 ctdbd[17711]: Vacuuming is disabled for non-volatile database passdb.tdb
2021/08/05 16:40:27.083823 ctdbd[17711]: Attached to database '/var/lib/ctdb/persistent/passdb.tdb.0' with flags 0x400
2021/08/05 16:40:27.085703 ctdbd[17711]: Vacuuming is disabled for non-volatile database registry.tdb
2021/08/05 16:40:27.085712 ctdbd[17711]: Attached to database '/var/lib/ctdb/persistent/registry.tdb.0' with flags 0x400
2021/08/05 16:40:27.087537 ctdbd[17711]: Vacuuming is disabled for non-volatile database account_policy.tdb
2021/08/05 16:40:27.087548 ctdbd[17711]: Attached to database '/var/lib/ctdb/persistent/account_policy.tdb.0' with flags 0x400
2021/08/05 16:40:27.089228 ctdbd[17711]: Vacuuming is disabled for non-volatile database secrets.tdb
2021/08/05 16:40:27.089238 ctdbd[17711]: Attached to database '/var/lib/ctdb/persistent/secrets.tdb.0' with flags 0x400
2021/08/05 16:40:27.090860 ctdbd[17711]: Vacuuming is disabled for non-volatile database group_mapping.tdb
2021/08/05 16:40:27.090866 ctdbd[17711]: Attached to database '/var/lib/ctdb/persistent/group_mapping.tdb.0' with flags 0x400
2021/08/05 16:40:27.092493 ctdbd[17711]: Vacuuming is disabled for non-volatile database ctdb.tdb
2021/08/05 16:40:27.092501 ctdbd[17711]: Attached to database '/var/lib/ctdb/persistent/ctdb.tdb.0' with flags 0x400
2021/08/05 16:40:27.094141 ctdbd[17711]: Vacuuming is disabled for non-volatile database share_info.tdb
2021/08/05 16:40:27.094149 ctdbd[17711]: Attached to database '/var/lib/ctdb/persistent/share_info.tdb.0' with flags 0x400
2021/08/05 16:40:27.094158 ctdbd[17711]: Freeze db: share_info.tdb
2021/08/05 16:40:27.094168 ctdbd[17711]: Set lock helper to "/usr/lib/x86_64-linux-gnu/ctdb/ctdb_lock_helper"
2021/08/05 16:40:27.095569 ctdbd[17711]: Freeze db: ctdb.tdb
2021/08/05 16:40:27.096548 ctdbd[17711]: Freeze db: group_mapping.tdb
2021/08/05 16:40:27.097594 ctdbd[17711]: Freeze db: secrets.tdb
2021/08/05 16:40:27.098660 ctdbd[17711]: Freeze db: account_policy.tdb
2021/08/05 16:40:27.099686 ctdbd[17711]: Freeze db: registry.tdb
2021/08/05 16:40:27.100562 ctdbd[17711]: Freeze db: passdb.tdb
2021/08/05 16:40:27.101391 ctdbd[17711]: Set runstate to SETUP (2)
2021/08/05 16:40:27.111361 ctdbd[17711]: Keepalive monitoring has been started
2021/08/05 16:40:27.111389 ctdbd[17711]: Set runstate to FIRST_RECOVERY (3)
2021/08/05 16:40:27.111828 ctdb-recoverd[17813]: monitor_cluster starting
2021/08/05 16:40:27.112320 ctdb-recoverd[17813]: Initial recovery master set - forcing election
2021/08/05 16:40:27.112459 ctdbd[17711]: This node (0) is now the recovery master
2021/08/05 16:40:28.111980 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:29.112852 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:30.113680 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:30.113732 ctdb-recoverd[17813]: Election period ended, master=0
2021/08/05 16:40:30.113874 ctdb-recoverd[17813]: Node:0 was in recovery mode. Start recovery process
2021/08/05 16:40:30.113880 ctdb-recoverd[17813]: ../../ctdb/server/ctdb_recoverd.c:1077 Starting do_recovery
2021/08/05 16:40:30.113895 ctdb-recoverd[17813]: ../../ctdb/server/ctdb_recoverd.c:1152 Recovery initiated due to problem with node 0
2021/08/05 16:40:30.113922 ctdb-recoverd[17813]: ../../ctdb/server/ctdb_recoverd.c:1182 Recovery - updated flags
2021/08/05 16:40:30.113933 ctdb-recoverd[17813]: Set recovery_helper to "/usr/lib/x86_64-linux-gnu/ctdb/ctdb_recovery_helper"
2021/08/05 16:40:30.118043 ctdb-recovery[17830]: Set recovery mode to ACTIVE
2021/08/05 16:40:30.118080 ctdbd[17711]: Recovery has started
2021/08/05 16:40:30.124679 ctdb-recovery[17830]: start_recovery event finished
2021/08/05 16:40:30.124881 ctdb-recovery[17830]: updated VNNMAP
2021/08/05 16:40:30.124941 ctdb-recovery[17830]: recover database 0xc3078fba
2021/08/05 16:40:30.125088 ctdb-recovery[17830]: recover database 0x6645c6c4
2021/08/05 16:40:30.125144 ctdb-recovery[17830]: recover database 0xa1413774
2021/08/05 16:40:30.125282 ctdb-recovery[17830]: recover database 0x7132c184
2021/08/05 16:40:30.125360 ctdb-recovery[17830]: recover database 0x2ca251cf
2021/08/05 16:40:30.125468 ctdb-recovery[17830]: recover database 0x6cf2837d
2021/08/05 16:40:30.125520 ctdb-recovery[17830]: recover database 0x3ef19640
2021/08/05 16:40:30.125894 ctdbd[17711]: Freeze db: share_info.tdb frozen
2021/08/05 16:40:30.125999 ctdbd[17711]: Freeze db: ctdb.tdb frozen
2021/08/05 16:40:30.126097 ctdbd[17711]: Freeze db: group_mapping.tdb frozen
2021/08/05 16:40:30.126196 ctdbd[17711]: Freeze db: secrets.tdb frozen
2021/08/05 16:40:30.126330 ctdbd[17711]: Freeze db: account_policy.tdb frozen
2021/08/05 16:40:30.126438 ctdbd[17711]: Freeze db: registry.tdb frozen
2021/08/05 16:40:30.126535 ctdbd[17711]: Freeze db: passdb.tdb frozen
2021/08/05 16:40:30.168184 ctdbd[17711]: Thaw db: share_info.tdb generation 526794162
2021/08/05 16:40:30.168215 ctdbd[17711]: Release freeze handle for db share_info.tdb
2021/08/05 16:40:30.168563 ctdbd[17711]: Thaw db: ctdb.tdb generation 526794162
2021/08/05 16:40:30.168572 ctdbd[17711]: Release freeze handle for db ctdb.tdb
2021/08/05 16:40:30.168830 ctdbd[17711]: Thaw db: group_mapping.tdb generation 526794162
2021/08/05 16:40:30.168840 ctdbd[17711]: Release freeze handle for db group_mapping.tdb
2021/08/05 16:40:30.169108 ctdbd[17711]: Thaw db: secrets.tdb generation 526794162
2021/08/05 16:40:30.169116 ctdbd[17711]: Release freeze handle for db secrets.tdb
2021/08/05 16:40:30.169362 ctdbd[17711]: Thaw db: account_policy.tdb generation 526794162
2021/08/05 16:40:30.169371 ctdbd[17711]: Release freeze handle for db account_policy.tdb
2021/08/05 16:40:30.169575 ctdbd[17711]: Thaw db: registry.tdb generation 526794162
2021/08/05 16:40:30.169583 ctdbd[17711]: Release freeze handle for db registry.tdb
2021/08/05 16:40:30.169833 ctdbd[17711]: Thaw db: passdb.tdb generation 526794162
2021/08/05 16:40:30.169841 ctdbd[17711]: Release freeze handle for db passdb.tdb
2021/08/05 16:40:30.171601 ctdb-recovery[17830]: 7 of 7 databases recovered
2021/08/05 16:40:30.171800 ctdbd[17711]: Recovery mode set to NORMAL
2021/08/05 16:40:30.171864 ctdb-recovery[17830]: Set recovery mode to NORMAL
2021/08/05 16:40:30.171981 ctdbd[17711]: Recovery has finished
2021/08/05 16:40:30.177881 ctdbd[17711]: Set runstate to STARTUP (4)
2021/08/05 16:40:30.178089 ctdb-recovery[17830]: recovered event finished
2021/08/05 16:40:30.178593 ctdb-recoverd[17813]: Takeover run starting
2021/08/05 16:40:30.178651 ctdb-recoverd[17813]: Set takeover_helper to "/usr/lib/x86_64-linux-gnu/ctdb/ctdb_takeover_helper"
2021/08/05 16:40:30.182103 ctdb-takeover[17845]: No nodes available to host public IPs yet
2021/08/05 16:40:30.187496 ctdb-recoverd[17813]: Takeover run completed successfully
2021/08/05 16:40:30.187757 ctdb-recoverd[17813]: ../../ctdb/server/ctdb_recoverd.c:1200 Recovery complete
2021/08/05 16:40:30.187809 ctdb-recoverd[17813]: Resetting ban count to 0 for all nodes
2021/08/05 16:40:30.187882 ctdb-recoverd[17813]: Just finished a recovery. New recoveries will now be suppressed for the rerecovery timeout (10 seconds)
2021/08/05 16:40:30.188015 ctdb-recoverd[17813]: Disabling recoveries for 10 seconds
2021/08/05 16:40:31.114528 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:31.114564 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:31.114910 ctdb-recoverd[17813]: Initial interface fetched
2021/08/05 16:40:31.115050 ctdb-recoverd[17813]: Trigger takeoverrun
2021/08/05 16:40:31.115176 ctdb-recoverd[17813]: Takeover run starting
2021/08/05 16:40:31.119066 ctdb-takeover[17857]: No nodes available to host public IPs yet
2021/08/05 16:40:31.125479 ctdb-recoverd[17813]: Takeover run completed successfully
2021/08/05 16:40:32.115874 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:32.115907 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:33.116230 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:33.116263 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:34.116589 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:34.116622 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:35.117432 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:35.117466 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:36.117776 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:36.117811 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:37.118133 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:37.118168 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:38.118484 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:38.118518 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:39.118835 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:39.118868 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:40.119192 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:40.119227 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:40.189250 ctdb-recoverd[17813]: Reenabling recoveries after timeout
2021/08/05 16:40:41.119562 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:41.119606 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:42.120399 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:42.120433 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:43.120736 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:43.120771 ctdbd[17711]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 16:40:44.121619 ctdbd[17711]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 16:40:44.121675 ctdbd[17711]: ctdb_recheck_persistent_health: OK[7] FAIL[0]
2021/08/05 16:40:44.121681 ctdbd[17711]: Running the "startup" event.
2021/08/05 16:40:45.094445 ctdb-eventd[17714]: 50.samba: Job for smbd.service failed because the control process exited with error code.
2021/08/05 16:40:45.094470 ctdb-eventd[17714]: 50.samba: See "systemctl status smbd.service" and "journalctl -xe" for details.
2021/08/05 16:41:21.036098 ctdb-eventd[17714]: 50.samba: Failed to start samba
2021/08/05 16:41:21.036128 ctdbd[17711]: startup event failed

Output of “ctdb status” :

ctdb status
Number of nodes:2
pnn:0 192.168.200.120  UNHEALTHY (THIS NODE)
pnn:1 192.168.200.121  DISCONNECTED|UNHEALTHY|INACTIVE
Generation:526794162
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0

When i remove the two UCR Variables with:

ucr unset samba/global/options/clustering
ucr unset samba/global/options/include

i can restart the samba server again:

/etc/init.d/samba restart
[ ok ] Stopping smbd (via systemctl): smbd.service.
[ ok ] Stopping nmbd (via systemctl): nmbd.service.
[ ok ] Starting nmbd (via systemctl): nmbd.service.
[ ok ] Starting smbd (via systemctl): smbd.service.

For testing purposes i added the following two lines in my smb.conf in the [global] section manually:

	clustered = yes
        include = registry

Im am again able to restart the samba service:

/etc/init.d/samba restart
[ ok ] Stopping smbd (via systemctl): smbd.service.
[ ok ] Stopping nmbd (via systemctl): nmbd.service.
[ ok ] Starting nmbd (via systemctl): nmbd.service.
[ ok ] Starting smbd (via systemctl): smbd.service.

And i am also able to restart the ctdb service:
service ctdb restart
These Lines are added to my Log after the restart:

2021/08/05 17:06:38.291857 ctdbd[29790]: Starting CTDBD (Version 4.13.7-Univention) as PID: 29790
2021/08/05 17:06:38.292834 ctdbd[29790]: Created PID file /var/run/ctdb/ctdbd.pid
2021/08/05 17:06:38.292849 ctdbd[29790]: Removed stale socket /var/run/ctdb/ctdbd.socket
2021/08/05 17:06:38.292862 ctdbd[29790]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket
2021/08/05 17:06:38.292869 ctdbd[29790]: Set real-time scheduler priority
2021/08/05 17:06:38.292935 ctdbd[29790]: Starting event daemon /usr/lib/x86_64-linux-gnu/ctdb/ctdb-eventd -P 29790 -S 14
2021/08/05 17:06:38.298171 ctdbd[29790]: Set runstate to INIT (1)
2021/08/05 17:06:38.381814 ctdbd[29790]: PNN is 0
2021/08/05 17:06:38.381861 ctdbd[29790]: Loaded public addresses from /etc/ctdb/public_addresses
2021/08/05 17:06:38.383758 ctdbd[29790]: Vacuuming is disabled for non-volatile database passdb.tdb
2021/08/05 17:06:38.383770 ctdbd[29790]: Attached to database '/var/lib/ctdb/persistent/passdb.tdb.0' with flags 0x400
2021/08/05 17:06:38.385641 ctdbd[29790]: Vacuuming is disabled for non-volatile database registry.tdb
2021/08/05 17:06:38.385650 ctdbd[29790]: Attached to database '/var/lib/ctdb/persistent/registry.tdb.0' with flags 0x400
2021/08/05 17:06:38.387676 ctdbd[29790]: Vacuuming is disabled for non-volatile database account_policy.tdb
2021/08/05 17:06:38.387687 ctdbd[29790]: Attached to database '/var/lib/ctdb/persistent/account_policy.tdb.0' with flags 0x400
2021/08/05 17:06:38.389335 ctdbd[29790]: Vacuuming is disabled for non-volatile database secrets.tdb
2021/08/05 17:06:38.389342 ctdbd[29790]: Attached to database '/var/lib/ctdb/persistent/secrets.tdb.0' with flags 0x400
2021/08/05 17:06:38.391332 ctdbd[29790]: Vacuuming is disabled for non-volatile database group_mapping.tdb
2021/08/05 17:06:38.391338 ctdbd[29790]: Attached to database '/var/lib/ctdb/persistent/group_mapping.tdb.0' with flags 0x400
2021/08/05 17:06:38.392926 ctdbd[29790]: Vacuuming is disabled for non-volatile database ctdb.tdb
2021/08/05 17:06:38.392933 ctdbd[29790]: Attached to database '/var/lib/ctdb/persistent/ctdb.tdb.0' with flags 0x400
2021/08/05 17:06:38.394507 ctdbd[29790]: Vacuuming is disabled for non-volatile database share_info.tdb
2021/08/05 17:06:38.394513 ctdbd[29790]: Attached to database '/var/lib/ctdb/persistent/share_info.tdb.0' with flags 0x400
2021/08/05 17:06:38.394523 ctdbd[29790]: Freeze db: share_info.tdb
2021/08/05 17:06:38.394533 ctdbd[29790]: Set lock helper to "/usr/lib/x86_64-linux-gnu/ctdb/ctdb_lock_helper"
2021/08/05 17:06:38.395728 ctdbd[29790]: Freeze db: ctdb.tdb
2021/08/05 17:06:38.396739 ctdbd[29790]: Freeze db: group_mapping.tdb
2021/08/05 17:06:38.397669 ctdbd[29790]: Freeze db: secrets.tdb
2021/08/05 17:06:38.398588 ctdbd[29790]: Freeze db: account_policy.tdb
2021/08/05 17:06:38.399306 ctdbd[29790]: Freeze db: registry.tdb
2021/08/05 17:06:38.400258 ctdbd[29790]: Freeze db: passdb.tdb
2021/08/05 17:06:38.400968 ctdbd[29790]: Set runstate to SETUP (2)
2021/08/05 17:06:38.411201 ctdbd[29790]: Keepalive monitoring has been started
2021/08/05 17:06:38.411246 ctdbd[29790]: Set runstate to FIRST_RECOVERY (3)
2021/08/05 17:06:38.411825 ctdb-recoverd[29910]: monitor_cluster starting
2021/08/05 17:06:38.412061 ctdb-recoverd[29910]: Initial recovery master set - forcing election
2021/08/05 17:06:41.413786 ctdbd[29790]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 17:06:41.413945 ctdb-recoverd[29910]: Election period ended, master=0
2021/08/05 17:06:41.414186 ctdb-recoverd[29910]: Node:0 was in recovery mode. Start recovery process
2021/08/05 17:06:41.414194 ctdb-recoverd[29910]: ../../ctdb/server/ctdb_recoverd.c:1077 Starting do_recovery
2021/08/05 17:06:41.414197 ctdb-recoverd[29910]: ../../ctdb/server/ctdb_recoverd.c:1152 Recovery initiated due to problem with node 0
2021/08/05 17:06:41.414236 ctdb-recoverd[29910]: ../../ctdb/server/ctdb_recoverd.c:1182 Recovery - updated flags
2021/08/05 17:06:41.414247 ctdb-recoverd[29910]: Set recovery_helper to "/usr/lib/x86_64-linux-gnu/ctdb/ctdb_recovery_helper"
2021/08/05 17:06:41.418304 ctdb-recovery[29915]: Set recovery mode to ACTIVE
2021/08/05 17:06:41.418339 ctdbd[29790]: Recovery has started
2021/08/05 17:06:41.425581 ctdb-recovery[29915]: start_recovery event finished
2021/08/05 17:06:41.425627 ctdb-recovery[29915]: updated VNNMAP
...
2021/08/05 17:06:41.468405 ctdb-recovery[29915]: 7 of 7 databases recovered
2021/08/05 17:06:41.468433 ctdbd[29790]: Recovery mode set to NORMAL
2021/08/05 17:06:41.468450 ctdb-recovery[29915]: Set recovery mode to NORMAL
2021/08/05 17:06:41.468460 ctdbd[29790]: Recovery has finished
2021/08/05 17:06:41.475098 ctdbd[29790]: Set runstate to STARTUP (4)
2021/08/05 17:06:41.475139 ctdb-recovery[29915]: recovered event finished
2021/08/05 17:06:41.475517 ctdb-recoverd[29910]: Takeover run starting
2021/08/05 17:06:41.475529 ctdb-recoverd[29910]: Set takeover_helper to "/usr/lib/x86_64-linux-gnu/ctdb/ctdb_takeover_helper"
2021/08/05 17:06:41.478857 ctdb-takeover[29932]: No nodes available to host public IPs yet
2021/08/05 17:06:41.484906 ctdb-recoverd[29910]: Takeover run completed successfully
2021/08/05 17:06:41.485057 ctdb-recoverd[29910]: ../../ctdb/server/ctdb_recoverd.c:1200 Recovery complete
2021/08/05 17:06:41.485062 ctdb-recoverd[29910]: Resetting ban count to 0 for all nodes
2021/08/05 17:06:41.485065 ctdb-recoverd[29910]: Just finished a recovery. New recoveries will now be suppressed for the rerecovery timeout (10 seconds)
2021/08/05 17:06:41.485068 ctdb-recoverd[29910]: Disabling recoveries for 10 seconds
2021/08/05 17:06:42.414656 ctdbd[29790]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 17:06:42.414689 ctdbd[29790]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 17:06:42.414826 ctdb-recoverd[29910]: Initial interface fetched
2021/08/05 17:06:42.414873 ctdb-recoverd[29910]: Trigger takeoverrun
2021/08/05 17:06:42.414910 ctdb-recoverd[29910]: Takeover run starting
2021/08/05 17:06:42.418360 ctdb-takeover[29941]: No nodes available to host public IPs yet
2021/08/05 17:06:42.425012 ctdb-recoverd[29910]: Takeover run completed successfully
2021/08/05 17:06:43.415031 ctdbd[29790]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 17:06:43.415063 ctdbd[29790]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
2021/08/05 17:06:55.420154 ctdbd[29790]: CTDB_WAIT_UNTIL_RECOVERED
2021/08/05 17:06:55.420197 ctdbd[29790]: ctdb_recheck_persistent_health: OK[7] FAIL[0]
2021/08/05 17:06:55.420201 ctdbd[29790]: Running the "startup" event.
2021/08/05 17:06:57.391451 ctdbd[29790]: startup event OK - enabling monitoring
2021/08/05 17:06:57.391465 ctdbd[29790]: Set runstate to RUNNING (5)
2021/08/05 17:06:59.458483 ctdbd[29790]: monitor event OK - node re-enabled
2021/08/05 17:06:59.458690 ctdbd[29790]: Node became HEALTHY. Ask recovery master to reallocate IPs
2021/08/05 17:06:59.458871 ctdb-recoverd[29910]: Node 0 has changed flags - now 0x0  was 0x2
2021/08/05 17:07:00.438061 ctdb-recoverd[29910]: Unassigned IP 192.168.100.23 can be served by this node
2021/08/05 17:07:00.438085 ctdb-recoverd[29910]: Unassigned IP 192.168.100.22 can be served by this node
2021/08/05 17:07:00.438116 ctdb-recoverd[29910]: Trigger takeoverrun
2021/08/05 17:07:00.438152 ctdb-recoverd[29910]: Takeover run starting
2021/08/05 17:07:00.442209 ctdbd[29790]: Takeover of IP 192.168.100.23/24 on interface enp0s3
2021/08/05 17:07:00.442497 ctdbd[29790]: Takeover of IP 192.168.100.22/24 on interface enp0s3
2021/08/05 17:07:00.466856 ctdb-recoverd[29910]: Takeover run completed successfully

So this time CTDB is able top start. After i start the ctdb service on both fileservers, both of them are unable to connect to the other node and both have assigned themself the two public ctdb addresses. But i am unable to connect to any of these ips via smb client (e.G Windows 10 machine)

filserver1 ipconfig:

enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:2f:60:17 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.20/24 brd 192.168.100.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.100.23/24 brd 192.168.100.255 scope global secondary enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.100.22/24 brd 192.168.100.255 scope global secondary enp0s3
       valid_lft forever preferred_lft forever

filserver1 ctdb status:

Number of nodes:2
pnn:0 192.168.200.120  OK (THIS NODE)
pnn:1 192.168.200.121  DISCONNECTED|UNHEALTHY|INACTIVE
Generation:256383205
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0

filserver2:

enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:55:b1:7f brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.21/24 brd 192.168.100.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.100.23/24 brd 192.168.100.255 scope global secondary enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.100.22/24 brd 192.168.100.255 scope global secondary enp0s3
       valid_lft forever preferred_lft forever

filserver2 ctdb status:

Number of nodes:2
pnn:0 192.168.200.120  DISCONNECTED|UNHEALTHY|INACTIVE
pnn:1 192.168.200.121  OK (THIS NODE)
Generation:144399470
Size:1
hash:0 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1

So my main questions at the moment are:
1.How can i succesfully add these two config lines to my smb.conf on a Univention System (preferrebly via UCR):

    clustered = yes
    include = registry

2.Is it even possible to run ctdb on univention a fileserver and if so, what am i doing wrong?

Id be happy to hear if anyone has suggestions what i am doing wrong and also if anyone ever managed to get ctdb succesfully running on UCS.

Thanks in advance!

Chears Sebastian