[ClusterLabs Developers] bundle/docker: zombie process on resource stop

Ken Gaillot kgaillot at redhat.com
Fri Jul 28 11:04:59 EDT 2017

On Fri, 2017-07-28 at 09:04 +0200, Jan Pokorný wrote:
> On 27/07/17 17:40 -0500, Ken Gaillot wrote:
> > On Thu, 2017-07-27 at 23:26 +0200, Jan Pokorný wrote:
> >> On 24/07/17 17:59 +0200, Valentin Vidic wrote:
> >>> On Mon, Jul 24, 2017 at 09:57:01AM -0500, Ken Gaillot wrote:
> >>>> Are you sure you have pacemaker 1.1.17 inside the container as well? The
> >>>> pid-1 reaping stuff was added then.
> >>> 
> >>> Yep, the docker container from the bundle example got an older
> >>> version installed, so mystery solved :)
> >>> 
> >>>   pacemaker-remote-1.1.15-11.el7_3.5.x86_64
> >> 
> >> As with docker/moby kind of bundles, pacemaker on host knows when it
> >> sets pacemaker_remoted as the command to be run within the container
> >> or not, it would be possible for it in such case check whether this
> >> remote peer is recent enough to cope with zombie reaping and prevent
> >> it from running any resources if not.
> > 
> > Leaving zombies behind is preferable to being unable to use containers
> > with an older pacemaker_remoted installed. A common use case of
> > containers is to run some legacy application that requires an old OS
> > environment. The ideal usage there would be to compile a newer pacemaker
> > for it, but many users won't have that option.
> I was talking about in-bundle use case (as opposed to generic
> pacemaker-remote one) in particular where it might be preferable

Right, bundles talk to pacemaker_remoted inside the container. The
cluster nodes have to be 1.1.17+ to support bundles, but the container
OS can have any pacemaker version that supports pacemaker remote (though
of course, 1.1.17+ is preferred for the zombie reaping as well as other
generic remote bugfixes).

> to have such sanity check in place as opposed to hard-to-predict
> consequences, such as when the resource cannot be stopped due to
> interference with zombies (well, there is whole lot of other issues
> with this weak grip on processess, such as the resource agents on
> host can get seriously confused by the processes running in the
> local containers!).
> For the particular, specific use case at hand, it might be reasonable
> to require pacemaker-remote version that actually got bundle-ready,
> >> The catch -- pacemaker on host cannot likely evalute this "recent
> >> enough" part of the equation properly as there was no LRMD protocol
> >> version bump for 1.1.17.  Correct?  Any other hints it could use?
> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/developers

Ken Gaillot <kgaillot at redhat.com>

More information about the Developers mailing list