Keycloak multi-server redundancy/failover

lleo · August 30, 2023, 12:23pm

Hello All
So I migrated my UCS setup to keycloak for iDP.
SSO works just fine and all is well.
According to keycloak in UCS documentation, multiple UCS nodes can have keycloak installed and automatically provide increased avaialability and redundancy.
My UCS setup includes a primary and a backup node, and keycloak is installed on both.
My problem is that if my primary node is offline, SSO does not work. DNS of the SSO URL seem to resolv correctly even when the primary node is offline.

In the keycloak UCS realm, under the User Federation section I added both the primary and the backup LDAP uri to the “Connection URL” setting, but still, if the primary controller is not available, SSO does not work.

Does anyone have a tested/confirmed high availability, redundancy of their keycloak install?

lleo · August 30, 2023, 9:50pm

I realize that I did not include what error I see. When my primary node is offline, the keycloak app could not access its database

2023-08-30 17:31:38,693 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-0) Acquisition timeout while waiting for new connection
2023-08-30 17:31:38,694 ERROR [org.keycloak.services.scheduled.ScheduledTaskRunner] (Timer-0) Failed to run scheduled task ClearExpiredAdminEvents: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection

so from there looked into the app configuration, and to my surprize I found that the keycloak app on the backup node is pointed to the primary node DB, which obviously was offline.
Specifically under the administrative settings of the app on the backup node, the settings of:

Defines the FQDN of the UCS instance used to change user password.
[primary-node.domain.net]

Database settings
Database URI (e.g. jdbc:postgresql://dbhost/keycloak?ssl=require).
[jdbc:postgresql://primary-node.domain.net:5432/keycloak?sslmode=require]

The interesting part is that there is a postgresql DB and relevant user, permission created on the backup node, but not used. Correcting these URIs resulted in broken keycloak install.

Tested to complete uninstall and reinstall, and found that the node installed second will have it DB pointed to the first app, i.e. if installed keycloak on the backup node first, and primary second, the app on the primary node would be pointed to the backup node.
If I specifically specified the URI during install, then I ended with a broken install, where install of app would not complete

lleo · September 9, 2023, 3:23pm

Well, just 2 days after my question here, Univention published a blog about their vision on making keycloak “trully” fail-safe and highly available.
You will find it here: https://www.univention.com/blog-en/2023/09/univention-mariadb-keycloak/

I reserve my comments on the perceived intent about the choices here, but I remain interested in this aspect mentioned in the link above:

In principle, it is possible to run a cluster with the databases provided by UCS. However, the setup is not trivial and UCS does not provide any simple options for this. Administrators must not only set up this setup but also operate it themselves.

Perhaps Univention could share this ‘not trivial’ setup for us where running a separate sql cluster is not worth it?