Sunday, September 7, 2025

Kubernetes additional tips - part 2

🔹 3. Copying a Pod for Post-Mortem Debugging


Sometimes a Pod crashes immediately (e.g., due to bad config, missing secrets, startup errors). In such cases, you don’t have enough time to attach a shell or inject an ephemeral container before it dies.


👉 Solution: use kubectl debug --copy-to to clone the Pod into a stable version that won’t crash, so you can perform a post-mortem analysis.



🔹 Step 1: Clone the Pod into a Debug Version



kubectl debug pod/my-crashing-pod \

  --copy-to=postmortem-pod \

  --image=busybox \

  -- bash -c "sleep 1d"



--copy-to=postmortem-pod → Creates a new Pod called postmortem-pod with the same configuration as the original.

--image=busybox → Ensures the new Pod uses a stable image with debugging tools (instead of the broken one).

sleep 1d → Keeps the container running for 1 day (adjust as needed), preventing immediate crash.



Step 2: Exec into the Stable Copy


kubectl exec -it postmortem-pod -- sh


Now you have an interactive shell inside the cloned Pod.



🔹 What Can You Inspect?


Once inside, you can check:

Configuration files


cat /etc/config/app.conf


Mounted secrets


ls /var/run/secrets/kubernetes.io/serviceaccount


Persistent volumes (PVCs) → same mounts as original Pod.

Environment variables


env | grep DB_


Application logs left behind in mounted volumes.


This helps you verify if the issue was caused by:

Wrong config or env vars

Missing secrets/config maps

Corrupted volume mounts

Crash-looping due to command misconfiguration



Why This Is Powerful


✅ Gives you time to debug a Pod that would otherwise crash instantly.

✅ Preserves volumes, secrets, and environment variables for accurate debugging.

✅ Lets you swap the container image for a debug-friendly one (busybox, ubuntu, netshoot, etc.).

✅ Non-destructive — original Pod stays intact (though it may still be crash-looping).



Real-World Example: Debugging a Crashing App


Suppose my-crashing-pod is failing because of a missing DB connection string.

You clone it with --copy-to.

Exec in, run env | grep DB_, and discover the variable is not set.

You check the ConfigMap/Secret mount, realize it’s missing.

Root cause: the Deployment forgot to mount the db-secret.




 Visual Sequence (Mermaid)


sequenceDiagram

    participant User

    participant kubectl

    participant API[Kubernetes API Server]

    participant PodCrasher[my-crashing-pod (CrashLoopBackoff)]

    participant PodClone[postmortem-pod (Stable copy)]


    User->>kubectl: kubectl debug pod/my-crashing-pod --copy-to=postmortem-pod

    kubectl->>API: Request clone with new image (busybox)

    API->>PodClone: Create postmortem-pod with same config/volumes/env

    User->>PodClone: kubectl exec -it postmortem-pod -- sh

    PodClone->>User: Stable shell for inspection


Summary:

When Pods crash too quickly to debug, cloning them with kubectl debug --copy-to gives you a stable replica for investigation. This allows full inspection of config, volumes, secrets, and logs without modifying the original Pod.



 4. Reading Logs Across Container Restarts


When Pods crash or restart, simply running kubectl logs shows logs from the current running container. This means you miss the previous attempt (which may contain the real cause of the crash).


👉 Kubernetes stores logs for both the current and the last terminated instance of a container.



Viewing the Previous Container’s Logs


# Logs from the last run before restart

kubectl logs my-app-pod -c app --previous


-c app → If your Pod has multiple containers, specify which one.

--previous → Fetch logs from the last terminated instance (before the container restarted).


This is essential for debugging CrashLoopBackOff situations, where the Pod dies quickly and restarts.




 Handling Multiple Restarts


If a Pod restarts many times, you’ll often need logs from each failed run. You can loop through them:


for i in {1..5}; do

  echo "--- Restart #$i ---"

  kubectl logs my-app-pod -c app --previous --since=1h

done


--since=1h → Restricts logs to the last hour (avoids giant logs).

The loop will repeatedly fetch the previous logs after each restart.


🔑 Note: Kubernetes only retains logs for the last terminated container instance, not the full restart history. For deeper history, you need a log aggregator (e.g., EFK/ELK, Loki, Datadog, Splunk).


 Trimming Large Logs


When containers generate huge logs, --since and --tail are lifesavers:


# Last 100 lines from the previous instance

kubectl logs my-app-pod -c app --previous --tail=100


# Logs from the last 30 minutes

kubectl logs my-app-pod -c app --previous --since=30m


 Debugging Workflow

. Check Pod status & restarts


kubectl get pod my-app-pod

kubectl describe pod my-app-pod


Look at Restart Count and termination reasons.


2. Fetch logs from the last crash


kubectl logs my-app-pod -c app --previous


3. Filter for timeframe or lines if logs are too large.

4. Escalate to log aggregation if you need full history beyond one restart.



Visual Flow (Mermaid)

sequenceDiagram

    participant User

    participant kubectl

    participant KubeAPI

    participant Pod

    participant Container


    User->>kubectl: kubectl logs my-app-pod -c app

    kubectl->>KubeAPI: Request logs (current container)

    KubeAPI->>Container: Fetch running logs

    Container->>User: Returns logs (current only)


    User->>kubectl: kubectl logs my-app-pod -c app --previous

    kubectl->>KubeAPI: Request logs (terminated container)

    KubeAPI->>Pod: Get last restart logs

    Pod->>User: Returns logs from previous instance


 Summary:

kubectl logs → current instance logs.

kubectl logs --previous → last terminated instance logs (great for crash debugging).

Use --since / --tail to keep logs manageable.

For full restart history, integrate a centralized logging solution (ELK, Loki, etc.).




Would you like me to also show how to pipe logs directly into grep/jq for structured filtering (e.g., finding error stack traces across restarts)?


No comments:

Post a Comment