UMS-UMC-Server Pod becomes unresponsive in openDesk - Self-Service Email Delivery Fails
Problem:
In certain openDesk environments, the ums-umc-server
pod periodically becomes unresponsive. Although the pod remains in a Running
state in Kubernetes and portal login continues to function, backend processes such as the Self-Service Listener fail silently. This results in undelivered password change or reset emails.
Affected Versions:
- openDesk: Versions
1.2
and1.3
- Nubus for Kubernetes: Versions
1.5.1
and1.8.0
Symptoms:
- Pod
ums-umc-server
appears healthy (Running
) in Kubernetes, but internal services are not responsive. - The self-service listener fails to send password reset/change emails.
- Manual pod restart resolves the issue temporarily.
- Logs contain repeated HTTP 599 errors pointing to connectivity problems.
Example Log output:
kubectl logs ums-umc-server-0
2025-05-28 12:39:03.253
univention.management.console.resources.CouldNotConnect: HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server
2025-05-28 12:39:03.253
raise CouldNotConnect(exc)
2025-05-28 12:39:03.253
File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 166, in _handle_errors
2025-05-28 12:39:03.253
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-28 12:39:03.253
response = self._handle_errors(reraise)
2025-05-28 12:39:03.253
File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 145, in propagate_result
2025-05-28 12:39:03.253
^^^^^^^^^^^^^^^^^
2025-05-28 12:39:03.253
CORE.process('Cancel request for %s completed with %d' % (self._request_id, response.result().code))
2025-05-28 12:39:03.253
File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 664, in cb
2025-05-28 12:39:03.253
^^^^^^^^^^
2025-05-28 12:39:03.253
ret = callback()
2025-05-28 12:39:03.253
File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 740, in _run_callback
2025-05-28 12:39:03.253
Traceback (most recent call last):
2025-05-28 12:39:03.253
2025-05-28 12:39:03.253
During handling of the above exception, another exception occurred:
2025-05-28 12:39:03.253
2025-05-28 12:39:03.253
tornado.curl_httpclient.CurlError: HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server
2025-05-28 12:39:03.253
raise future.exception()
2025-05-28 12:39:03.253
File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 143, in reraise
2025-05-28 12:39:03.253
^^^^^^^^^^
2025-05-28 12:39:03.253
response = function()
2025-05-28 12:39:03.253
File "/usr/lib/python3/dist-packages/univention/management/console/resources.py", line 163, in _handle_errors
2025-05-28 12:39:03.253
Traceback (most recent call last):
2025-05-28 12:39:03.253
28.05.25 10:39:03 ERROR ( 1) : Exception in callback functools.partial(<function Command.cancel_request.<locals>.cb at 0x763f809444a0>, <Future finished exception=CouldNotConnect(HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server)>)
2025-05-28 12:39:03.253
28.05.25 10:39:03.253 MAIN ( WARN ) : Reaching module failed: HTTP 599: Failed to connect to portal.uni.vention.de port 80 after 0 ms: Couldn't connect to server
2025-05-28 12:39:03.252
28.05.25 10:39:03.252 MAIN ( WARN ) : Connection was aborted by the client!
Root Cause:
The UMC module’s timeout (default: 10 minutes) is reset with each incoming request. Components such as the self-service listener poll the UMC approximately every 5 minutes, preventing timeout expiry. As a result, a broken UMC session can persist indefinitely without being reset automatically.
Solution:
The descriped issue is fixrd with Nubus for Kubernes Version 1.9.x
and higher.
-
Upgrade openDesk to version
1.6.0
or newer. -
This version includes:
- Nubus for Kubernetes 1.11.0
- The UCRV
umc/self-service/rate-limit/trusted-hosts
to specify trusted hosts to bypass the UMC self-service rate limit. Erratum 5.2x116
Additional Notes to the UCRV:
- After updating, ensure the UCR variable is set appropriately for trusted hosts:
ucr set umc/self-service/rate-limit/trusted-hosts='127.0.0.1,::1,<self-service-pod-ip>'
- Monitor with:
kubectl logs <ums-umc-server-pod> -n <NAMESPACE>
Investigation:
1. Check Network Connectivity From the Pod
You can verify connectivity from the ums-umc-server
pod to the portal with the following commands:
kubectl exec -n <NAMESPACE> -it <ums-umc-server-pod> -- bash wget -v http://portal.uni.vention.de
Example (Correct Usage):
kubectl exec -it ums-umc-server-0 -- wget -v http://portal.test-opendesk.univention.dev/
Expected Output:
Defaulted container "umc-server" out of: umc-server, sssd-sidecar, prepare-config (init), load-internal-plugins (init), load-portal-extension (init), load-ox-extension (init), load-opendesk-extension (init), load-opendesk-a2g-mapper-extension (init)
--2025-06-19 08:22:41-- http://portal.test-opendesk.univention.dev/
Resolving portal.test-opendesk.univention.dev (portal.test-opendesk.univention.dev)... 193.71.134.137, 2001:8c0:7903::e:2
Connecting to portal.test-opendesk.univention.dev (portal.test-opendesk.univention.dev)|193.71.134.137|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://portal.test-opendesk.univention.dev/univention/portal/ [following]
--2025-06-19 08:22:41-- http://portal.test-opendesk.univention.dev/univention/portal/
Reusing existing connection to portal.test-opendesk.univention.dev:80.
HTTP request sent, awaiting response... 308 Permanent Redirect
Location: https://portal.test-opendesk.univention.dev/univention/portal [following]
--2025-06-19 08:22:41-- https://portal.test-opendesk.univention.dev/univention/portal
Connecting to portal.test-opendesk.univention.dev (portal.test-opendesk.univention.dev)|193.71.134.137|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://portal.test-opendesk.univention.dev/univention/portal/ [following]
--2025-06-19 08:22:41-- https://portal.test-opendesk.univention.dev/univention/portal/
Reusing existing connection to portal.test-opendesk.univention.dev:443.
HTTP request sent, awaiting response... 200 OK
Length: 2864 (2.8K) [text/html]
index.html: Read-only file system
Cannot write to ‘index.html’ (Read-only file system).
command terminated with exit code 3
The final error (
Read-only file system
) is expected and harmless, it simply means the file cannot be saved in the container’s temporary filesystem.
2. Test Local UMC Module Responsiveness
If the issue occurs again, you can manually check the local UMC service by running:
kubectl exec -it <ums-umc-server-pod> -- wget -O- localhost:8090/get/ipaddress
Expacted output/response:
[ ]
If the response is []
, it means:
-
The application inside the pod has successfully processed the request.
-
The HTTP server within the pod is running.
-
A valid HTTP response was returned.
This is a generic UMC command used to check if the UMC service is available or if the pod is unresponsive. It allows you to determine whether the problem is specific to the self-service
component or if it affects the entire UMC service.
Workaround:
This workaround does not fix the underlying issue and is not persistent.
How to Force a restart of a Kubernetes Pod
Method 1: Deleting a Single Pod
This is the most common method for restarting a single pod.
-
Identify the Pod Name:
First, find the full and exact name of the pod you want to restart. Use thekubectl get pods
command to list all pods in the current namespace.kubectl get pods
Example output:
NAME READY STATUS RESTARTS AGE my-app-pod-abcdefg-1a2b3 1/1 Running 0 2d another-pod-name 1/1 Running 0 1d
-
Delete the Pod:
Use thekubectl delete pod
command with the identified pod name. This command will terminate the pod. The Kubernetes controller managing the pod will detect its absence and immediately create a new one to maintain the desired number of replicas.kubectl delete pod my-app-pod-abcdefg-1a2b3
-
Verify the New Pod:
Confirm that the new pod has been created and is in aRunning
state by checking the pod list again.kubectl get pods
Method 2: Restarting All Pods within a Deployment (Recommended for Deployments)
If the pod is part of a Deployment
, the recommended and safest method is to trigger a rollout restart for the entire Deployment. This initiates a rolling update, ensuring a graceful restart without service interruption (provided there is more than one replica).
-
Identify the Deployment Name:
Find the name of the deployment managing the pods.kubectl get deployments
-
Trigger a Rollout Restart:
Use thekubectl rollout restart
command with the deployment name.kubectl rollout restart deployment <your-deployment-name>
This command updates the pod template, causing all pods in the deployment to be terminated and recreated sequentially.
Conclusion:
By deleting a pod or triggering a rollout restart on its parent Deployment, you leverage Kubernetes’s built-in self-healing capabilities to ensure a reliable and consistent application state. This approach is fundamental to Kubernetes’s declarative model.