[ClusterLabs Developers] bundle/docker: zombie process on resource stop

Mon Jul 24 10:00:04 UTC 2017

On 07/23/2017 10:19 AM, Valentin Vidic wrote:
> I'm seeing this state when trying to stop the docker bundle
> with pacemaker 1.1.17:
>
> 15738 ?        Ssl    7:23 /usr/sbin/dockerd -H fd://
> 15742 ?        Ssl    5:31  \_ containerd -l unix:///var/run/docker/libcontainerd/containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim containerd-shim --runtime runc
> 11221 ?        Sl     0:00  |   \_ containerd-shim 4cb4a14a49d5d54c2aac5a836255ca10785fb0a989c19539fc253afbbacea33d /var/run/docker/libcontainerd/4cb4a14a49d5d54c2aac5a836255ca10785fb0a989c19539fc253afbbacea33d runc
> 11238 ?        Ss     0:59  |       \_ /usr/sbin/pacemaker_remoted
> 11579 ?        Zs     0:11  |           \_ [httpd] <defunct>
>   867 ?        S      0:00  |           \_ /bin/sh /usr/lib/ocf/resource.d/heartbeat/apache stop
>  1165 ?        S      0:00  |               \_ sleep 1
> 11186 ?        Sl     0:00  \_ /usr/sbin/docker-proxy -proto tcp -host-ip 192.168.122.131 -host-port 3121 -container-ip 172.17.0.2 -container-port 3121
> 11197 ?        Sl     0:00  \_ /usr/sbin/docker-proxy -proto tcp -host-ip 192.168.122.131 -host-port 80 -container-ip 172.17.0.2 -container-port 80
>
> Seems the apache process thinks the https is still running
> because pacemaker_remoted did not handle the zombie properly?
>

lrmd / pacemaker_remoted (special incarnation of lrmd actually) don't
have any knowledge
about processes that have been spawned by RAs so I guess taking care of
zombies would
rather be an issue the RA should be responsible for.

Regards,
Klaus