This guide provides solutions for common issues encountered with Kubernetes container health probes.
Always start your troubleshooting with these commands:
# Get basic pod status
kubectl get pod <pod-name>
# Get detailed pod information, including probe configuration and events
kubectl describe pod <pod-name>
# View pod logs
kubectl logs <pod-name>
# View pod events
kubectl get events --field-selector involvedObject.name=<pod-name>
Symptoms:
Running and CrashLoopBackOffPossible Causes:
Solutions:
kubectl describe pod <pod-name> | grep -A 15 "Liveness:"
livenessProbe:
# Increase these values
initialDelaySeconds: 30 # Give app more time to start
periodSeconds: 10 # Check less frequently
timeoutSeconds: 5 # Allow more time for response
failureThreshold: 3 # Allow more failures before restart
kubectl logs <pod-name> --previous
kubectl describe pod <pod-name> | grep -A 10 "Limits:"
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
Symptoms:
Running status but READY column shows 0/1Possible Causes:
Solutions:
kubectl describe pod <pod-name> | grep -A 15 "Readiness:"
# Get pod IP
POD_IP=$(kubectl get pod <pod-name> -o jsonpath='{.status.podIP}')
# For HTTP probes - create a test pod
kubectl run test --rm -it --image=curlimages/curl -- curl -v http://$POD_IP:8080/ready
# For TCP probes - create a test pod
kubectl run test --rm -it --image=busybox -- nc -zv $POD_IP 3306
kubectl logs <pod-name> | grep -i ready
Verify external dependencies are available (databases, APIs, other services)
readinessProbe:
periodSeconds: 10 # Check less frequently
timeoutSeconds: 5 # Allow more time for response
failureThreshold: 3 # Allow more failures before marking not ready
Symptoms:
kubectl describe pod outputPossible Causes:
Solutions:
kubectl describe pod <pod-name> | grep -A 15 "Liveness\|Readiness\|Startup"
kubectl exec <pod-name> -- curl -v http://localhost:<port>/<path>
Check for network policy restrictions that might block probe requests
livenessProbe:
timeoutSeconds: 5 # Increase from default 1s to 5s
Symptoms:
Possible Causes:
Solutions:
kubectl top pod <pod-name>
livenessProbe:
# Increase these values
periodSeconds: 15 # Check less frequently
timeoutSeconds: 10 # Allow more time for response
failureThreshold: 5 # Require more consecutive failures
Look for patterns in failures (time of day, load patterns, etc.)
kubectl describe node <node-name>
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # 30 * 10s = 5 minutes to start
periodSeconds: 10
Symptoms:
Possible Causes:
Solutions:
kubectl exec <pod-name> -- curl -v http://localhost:<port>/<path>
kubectl exec <pod-name> -- curl -I http://localhost:<port>/<path>
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Accept
value: application/json
Ensure health endpoint returns appropriate status code (200-399 for success)
Symptoms:
Possible Causes:
Solutions:
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
livenessProbe:
initialDelaySeconds: 120 # Increase to 2 minutes
readinessProbe:
initialDelaySeconds: 30 # Increase to 30 seconds
Maximum startup time = failureThreshold × periodSeconds
Symptoms:
Possible Causes:
Solutions:
kubectl exec <pod-name> -- ls -la /path/to/script
kubectl exec <pod-name> -- /path/to/command
kubectl exec <pod-name> -- chmod +x /path/to/script
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "test -f /tmp/healthy || exit 1"
Symptoms:
Possible Causes:
Solutions:
kubectl exec <pod-name> -- netstat -tlnp
Check for port binding restrictions (make sure app binds to 0.0.0.0, not just 127.0.0.1)
kubectl exec <pod-name> -- nc -zv localhost <port>
Check for any NetworkPolicy restrictions
For complex cases, ephemeral debug containers can be useful:
# Start debug container (requires Kubernetes 1.18+ with feature enabled)
kubectl debug -it <pod-name> --image=busybox --target=<container-name>
# From the debug container, you can test network connections, check processes, etc.
wget -O- http://localhost:8080/healthz
netstat -tlnp
ps aux
For network-related issues:
# Create a privileged debug pod
kubectl run debug-pod --privileged --rm -it --image=nicolaka/netshoot -- bash
# Install tcpdump if needed
apt-get update && apt-get install -y tcpdump
# Capture probe traffic
tcpdump -i eth0 port <probe-port> -vvv
For performance-related problems:
# Get probe timing details
kubectl get pod <pod-name> -o json | jq '.status.conditions[] | select(.type=="Ready")'
# Check kubelet logs for probe details (on the node)
journalctl -u kubelet | grep <pod-name> | grep -i probe
Test probe behavior under load before deploying to production
For the CKAD exam, remember these troubleshooting tips:
kubectl get pods firstkubectl describe pod <pod-name> to see probe configuration and recent eventskubectl logs <pod-name>initialDelaySeconds for slow-starting applicationsperiodSeconds and timeoutSeconds for slow-responding appsfailureThreshold to control tolerance for intermittent failuresRemember that solving probe issues often requires a systematic approach of: check configuration → test manually → adjust parameters → verify solution.