Rescuing PVCs from CrashLoopBackOff - Dealing with badly PVC

Sometimes there are pods behaving badly in your cluster, looping over an infinite error state, leading to what is known as CrashLoopBackOff. When this pod has a persistent storage attached to it, and the reason of the CrashLoop is directly related to the state of the files in that same storage, what can you do to free/cleanup the disc so the pod can run again?

If you try to rsh/exec into the container, you will probably be saluted with the following message:

$ kubectl -n mateus exec -it blog-2-kps5s /bin/bash
error: unable to upgrade connection: container not found (“php”)

This happens because there is no container running. It’s up only for a small fraction of time before it dies.

One possible approach to this problem is to change the pod template of the ReplicationController/Deployment/DeploymentConfig/StatefulSet so it starts with a blocking, never-fails command like /bin/sh` or /bin/cat`. In order for the command to block you also must give it a tty and stdin.

Make sure you save the original command entry so you don’t lose it. If nothing appears it means that the container is starting from its default RUN docker instruction.

$ kubectl -n mateus get deployment/blog -o jsonpath \
— template='{.spec.template.spec.containers[0].command}'

Now let's update the deployment template:

$ kubectl -n mateus patch deployment/blog -p "
spec:
template:
spec:
containers:
- name: php
command: [ "/bin/cat" ]
tty: true
stdin: true
"

A new pod should be created. If it’s not, simply delete the old one and wait for a new one to become ready. Now you can enter the container, fix the disk issue and go back to the previous command if any was specified, or remove the command from it.

$ kubectl -n mateus exec -it blog-1-t8lx4 /bin/bash
bash-4.2$ ps afx
PID TTY STAT TIME COMMAND
7 ? Ss 0:00 /bin/bash <---- here we are
22 ? R+ 0:00 \_ ps afx
1 ? Ss+ 0:00 /bin/cat <---- container's command

Remove the fields added previously:

$ kubectl -n mateus patch deployment/blog --type=json -p '
[
{
"op": "remove",
"path": "/spec/template/spec/containers/0/command"
},
{
"op": "remove",
"path": "/spec/template/spec/containers/0/tty"
},
{
"op": "remove",
"path": "/spec/template/spec/containers/0/stdin"
}
]'

If a command already existed before you patched the controller, then use this instruction instead the first one:

{
"op": replace",
"path": "/spec/template/spec/containers/0/command",
"value": [ "old", "command", "and", "parameters" ]
}

That's all folks!

Social

Fale conosco

comercial@getup.io

Almeda Campinas 802, CJ 12, Jardim Paulista,

São Paulo - SP, 01404-001

Faça parte do time

jobs@getup.io

Nossos conteúdos

Kubicast