Problem:Nubus - UMS-UMC-Server Pod unresponsive - Self-Service Listener fail silently

UMS-UMC-Server Pod becomes unresponsive in openDesk - Self-Service Email Delivery Fails

Problem:

In certain openDesk environments, the ums-umc-server pod periodically becomes unresponsive. Although the pod remains in a Running state in Kubernetes and portal login continues to function, backend processes such as the Self-Service Listener fail silently. This results in undelivered password change or reset emails.


Affected Versions:

  • openDesk: Versions 1.2 and 1.3
  • Nubus for Kubernetes: Versions 1.5.1 and 1.8.0

Symptoms:

  • Pod ums-umc-server appears healthy (Running) in Kubernetes, but internal services are not responsive.
  • The self-service listener fails to send password reset/change emails.
  • Manual pod restart resolves the issue temporarily.
  • Logs contain repeated HTTP 599 errors pointing to connectivity problems.

Example Log output:

kubectl logs ums-umc-server-0

2025-05-28 12:39:03.253	
    univention.management.console.resources.CouldNotConnect: HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server
2025-05-28 12:39:03.253	
        raise CouldNotConnect(exc)
2025-05-28 12:39:03.253	
      File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 166, in _handle_errors
2025-05-28 12:39:03.253	
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-28 12:39:03.253	
        response = self._handle_errors(reraise)
2025-05-28 12:39:03.253	
      File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 145, in propagate_result
2025-05-28 12:39:03.253	
                                                                                    ^^^^^^^^^^^^^^^^^
2025-05-28 12:39:03.253	
        CORE.process('Cancel request for %s completed with %d' % (self._request_id, response.result().code))
2025-05-28 12:39:03.253	
      File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 664, in cb
2025-05-28 12:39:03.253	
              ^^^^^^^^^^
2025-05-28 12:39:03.253	
        ret = callback()
2025-05-28 12:39:03.253	
      File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 740, in _run_callback
2025-05-28 12:39:03.253	
    Traceback (most recent call last):
2025-05-28 12:39:03.253	
    
2025-05-28 12:39:03.253	
    During handling of the above exception, another exception occurred:
2025-05-28 12:39:03.253	
    
2025-05-28 12:39:03.253	
    tornado.curl_httpclient.CurlError: HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server
2025-05-28 12:39:03.253	
        raise future.exception()
2025-05-28 12:39:03.253	
      File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 143, in reraise
2025-05-28 12:39:03.253	
                   ^^^^^^^^^^
2025-05-28 12:39:03.253	
        response = function()
2025-05-28 12:39:03.253	
      File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 163, in _handle_errors
2025-05-28 12:39:03.253	
    Traceback (most recent call last):
2025-05-28 12:39:03.253	
28.05.25 10:39:03       ERROR      (        1) : Exception in callback functools.partial(<function Command.cancel_request.<locals>.cb at 0x763f809444a0>, <Future finished exception=CouldNotConnect(HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server)>)
2025-05-28 12:39:03.253	
28.05.25 10:39:03.253  MAIN        ( WARN    ) : Reaching module failed: HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server
2025-05-28 12:39:03.252	
28.05.25 10:39:03.252  MAIN        ( WARN    ) : Connection was aborted by the client!

Root Cause:

Bug #58159

The UMC module’s timeout (default: 10 minutes) is reset with each incoming request. Components such as the self-service listener poll the UMC approximately every 5 minutes, preventing timeout expiry. As a result, a broken UMC session can persist indefinitely without being reset automatically.


Solution:

The descriped issue is fixrd with Nubus for Kubernes Version 1.9.x and higher.

  • Upgrade openDesk to version 1.6.0 or newer.

  • This version includes:

    • Nubus for Kubernetes 1.11.0
    • The UCRV umc/self-service/rate-limit/trusted-hosts to specify trusted hosts to bypass the UMC self-service rate limit. Erratum 5.2x116

Additional Notes to the UCRV:

  • After updating, ensure the UCR variable is set appropriately for trusted hosts:

ucr set umc/self-service/rate-limit/trusted-hosts='127.0.0.1,::1,<self-service-pod-ip>'

  • Monitor with:

kubectl logs <ums-umc-server-pod> -n <NAMESPACE>


Investigation:

1. Check Network Connectivity From the Pod

You can verify connectivity from the ums-umc-server pod to the portal with the following commands:

kubectl exec -n <NAMESPACE> -it <ums-umc-server-pod> -- bash wget -v http://portal.uni.vention.de

Example (Correct Usage):

kubectl exec -it ums-umc-server-0 -- wget -v http://portal.test-opendesk.univention.dev/

Expected Output:

Defaulted container "umc-server" out of: umc-server, sssd-sidecar, prepare-config (init), load-internal-plugins (init), load-portal-extension (init), load-ox-extension (init), load-opendesk-extension (init), load-opendesk-a2g-mapper-extension (init)
--2025-06-19 08:22:41--  http://portal.test-opendesk.univention.dev/
Resolving portal.test-opendesk.univention.dev (portal.test-opendesk.univention.dev)... 193.71.134.137, 2001:8c0:7903::e:2
Connecting to portal.test-opendesk.univention.dev (portal.test-opendesk.univention.dev)|193.71.134.137|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://portal.test-opendesk.univention.dev/univention/portal/ [following]
--2025-06-19 08:22:41--  http://portal.test-opendesk.univention.dev/univention/portal/
Reusing existing connection to portal.test-opendesk.univention.dev:80.
HTTP request sent, awaiting response... 308 Permanent Redirect
Location: https://portal.test-opendesk.univention.dev/univention/portal [following]
--2025-06-19 08:22:41--  https://portal.test-opendesk.univention.dev/univention/portal
Connecting to portal.test-opendesk.univention.dev (portal.test-opendesk.univention.dev)|193.71.134.137|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://portal.test-opendesk.univention.dev/univention/portal/ [following]
--2025-06-19 08:22:41--  https://portal.test-opendesk.univention.dev/univention/portal/
Reusing existing connection to portal.test-opendesk.univention.dev:443.
HTTP request sent, awaiting response... 200 OK
Length: 2864 (2.8K) [text/html]
index.html: Read-only file system

Cannot write to ‘index.html’ (Read-only file system).
command terminated with exit code 3

The final error (Read-only file system) is expected and harmless, it simply means the file cannot be saved in the container’s temporary filesystem.


2. Test Local UMC Module Responsiveness

If the issue occurs again, you can manually check the local UMC service by running:

kubectl exec -it <ums-umc-server-pod> -- wget -O- localhost:8090/get/ipaddress

Expacted output/response:

[ ]

If the response is [], it means:

  • The application inside the pod has successfully processed the request.

  • The HTTP server within the pod is running.

  • A valid HTTP response was returned.

This is a generic UMC command used to check if the UMC service is available or if the pod is unresponsive. It allows you to determine whether the problem is specific to the self-service component or if it affects the entire UMC service.


Workaround:

:warning: This workaround does not fix the underlying issue and is not persistent.

How to Force a restart of a Kubernetes Pod

Method 1: Deleting a Single Pod

This is the most common method for restarting a single pod.

  1. Identify the Pod Name:
    First, find the full and exact name of the pod you want to restart. Use the kubectl get pods command to list all pods in the current namespace.

    kubectl get pods

    Example output:

    NAME                         READY   STATUS    RESTARTS   AGE
    my-app-pod-abcdefg-1a2b3     1/1     Running   0          2d
    another-pod-name             1/1     Running   0          1d
    
  2. Delete the Pod:
    Use the kubectl delete pod command with the identified pod name. This command will terminate the pod. The Kubernetes controller managing the pod will detect its absence and immediately create a new one to maintain the desired number of replicas.

    kubectl delete pod my-app-pod-abcdefg-1a2b3

  3. Verify the New Pod:
    Confirm that the new pod has been created and is in a Running state by checking the pod list again.

    kubectl get pods

Method 2: Restarting All Pods within a Deployment (Recommended for Deployments)

If the pod is part of a Deployment, the recommended and safest method is to trigger a rollout restart for the entire Deployment. This initiates a rolling update, ensuring a graceful restart without service interruption (provided there is more than one replica).

  1. Identify the Deployment Name:
    Find the name of the deployment managing the pods.

    kubectl get deployments

  2. Trigger a Rollout Restart:
    Use the kubectl rollout restart command with the deployment name.

    kubectl rollout restart deployment <your-deployment-name>

    This command updates the pod template, causing all pods in the deployment to be terminated and recreated sequentially.

Conclusion:
By deleting a pod or triggering a rollout restart on its parent Deployment, you leverage Kubernetes’s built-in self-healing capabilities to ensure a reliable and consistent application state. This approach is fundamental to Kubernetes’s declarative model.