Troubleshooting Issues with Restarting Pods on Kubernetes

Kubernetes website has quite useful guides for troubleshooting issues with your application, service or cluster but sometimes those may not be very helpful, especially if your containers are constantly failing and restarting. Quite often the problem for that can be a permissions issue. The overall goal is that containers run with least privileges, however, this doesn’t work well if you have mounted persistent volumes. In this case, you will need to use the containers in privileged mode or use runAsUser in the security context.

While deploying Elasticsearch on Kubernetes and trying to use Azure File as a persistent volume for my containers, I, of course, encountered this issue yet again and started thinking of a way to figure out what is going on. It would be nice if the scheduler did have an option to pause (like, in the sense of debugging) the container for troubleshooting purposes and allow the developer to connect to the container and look around.

Well, the solution is quite simple. You just need to “pause” the start of the container by yourself. The simplest way to do that is to add the sleep command as a start command for your container. This is done in the containers section of your deployment YAML as follows:

- "sleep"
- "300"

This way, the scheduler will pause the start of the container for 5 mins (300 seconds) and you can easily attach to the container or connect to it by executing a supported shell. For example:

$ kubectl exec -it [your-pod-id] bash

From there on, you can execute the command starting your service as it is set in the image’s Dockerfile.

Simple trick but can save you some time.