QA: Kubernetes Scaling for Performance – Manual and Automatic Strategies

Once your application is live, the next challenge is handling traffic. Kubernetes makes scaling easy, but there’s a logic to how it works.


1. Manual Scaling (The Quick Fix)

If you know a marketing campaign is starting or a heavy task is coming up, you can scale manually via the CLI:

kubectl scale deployment web-app-deployment --replicas=10
  • Use Case: Immediate control when you anticipate a load spike.
  • Limitation: It requires a human to monitor the traffic and run the command.

2. Horizontal Pod Autoscaler (HPA)

The HPA is the “Autopilot” mode. It scales the number of pods up or down based on actual resource usage.

Requirement: For HPA to work, you must have the Metrics Server installed in your cluster and define resources (requests) in your Deployment YAML.

kubectl autoscale deployment web-app-deployment --cpu-percent=50 --min=2 --max=10
  • The Logic: Every 15 seconds, Kubernetes checks the CPU usage. If the average exceeds 50%, it adds pods (up to 10). If the load drops, it removes pods (down to 2) to save resources and costs.

3. Vertical Pod Autoscaler (VPA) – Making Pods “Stronger”

Sometimes, adding more pods doesn’t help—for example, with legacy apps or databases that don’t scale horizontally. This is where the VPA comes in. Instead of changing the number of pods, it adjusts the CPU and RAM of the existing ones.

Unlike HPA, VPA is usually defined via a YAML manifest:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: web-app-deployment
  updatePolicy:
    updateMode: "Auto"
  • Update Modes:
  • Off: VPA only provides “Recommendations” for CPU/RAM settings but changes nothing.
  • Initial: Assigns resources only when the pod is first created.
  • Auto: If a pod needs more power, VPA restarts it with higher resource limits.

:warning: Do not use HPA and VPA on the same metric (like CPU) simultaneously, as they will fight each other for control.


4. The Role of Health Checks

Scaling is useless if the new pods aren’t actually working. Always use Readiness Probes in your YAML:

  • Readiness Probe: Tells Kubernetes when a pod is ready to receive traffic.
  • Liveness Probe: Tells Kubernetes if a pod is still alive or if it needs to be restarted.

4.1 Readiness Probes: “Are you actually ready?”

When you scale up (manually or automatically), Kubernetes creates new pods. However, just because a container has started doesn’t mean the app is ready to handle traffic. It might still be loading a cache or connecting to a database.

Without a Readiness Probe, Kubernetes sends traffic to the pod immediately, leading to 502 or 503 errors for your users.

Implementation Example:

spec:
  containers:
  - name: web-app
    image: nginx
    readinessProbe:
      httpGet:
        path: /healthz  # A specific endpoint in your app
        port: 80
      initialDelaySeconds: 5  # Wait 5s before first check
      periodSeconds: 10       # Check every 10s
      failureThreshold: 3     # Remove from traffic after 3 failures

4.2 Liveness Probes: “Are you still alive?”

While the Readiness Probe determines if a pod can start receiving traffic, the Liveness Probe monitors the pod for its entire lifecycle.

The Problem: Sometimes an application crashes in a way that the container stays “Running,” but the app inside is frozen, deadlocked, or broken. To Kubernetes, the pod looks fine, but it’s actually a “zombie” that can’t do anything.

The Solution: The Liveness Probe tells Kubernetes: “Check if the heart is still beating. If not, kill the pod and start a fresh one.”

Implementation Example:

spec:
  containers:
  - name: web-app
    image: nginx
    livenessProbe:
      httpGet:
        path: /healthz
        port: 80
      initialDelaySeconds: 15 # Give the app plenty of time to boot
      periodSeconds: 20      # Check every 20s
      failureThreshold: 3    # Restart the container after 3 failed attempts
  • What happens on failure? Unlike the Readiness Probe (which just stops traffic), a failing Liveness Probe triggers a container restart .
  • The “Death Loop” Warning: Be careful! If you set the initialDelaySeconds too low, Kubernetes might kill your app before it has finished starting up. This creates a “CrashLoopBackOff” where the app is never allowed to fully boot.

:point_right: Summary: Successful scaling depends on precision. Use HPA for web apps, VPA for resource-heavy single instances, and always implement Readiness Probes to ensure your users never see a “Service Unavailable” screen during a scale-up.

This topic was automatically closed after 24 hours. New replies are no longer allowed.