[ClusterLabs] epic fail

Mon Jul 24 12:24:39 EDT 2017

On Mon, Jul 24, 2017 at 10:38:40AM -0500, Ken Gaillot wrote:
> Standby is not necessary, it's just a cautious step that allows the
> admin to verify that all resources moved off correctly. The restart that
> yum does should be sufficient for pacemaker to move everything.
> 
> A restart shouldn't lead to fencing in any case where something's not
> going seriously wrong. I'm not familiar with the "kernel is using it"
> message, I haven't run into that before.

Right, pacemaker upgrade might not be the biggest problem.  I've seen
other packages upgrades cause RA monitors to return results like 
$OCF_NOT_RUNNING or $OCF_ERR_INSTALLED.  This of course causes the
cluster to react, so I prefer the node standby option :)

In this case the pacemaker was trying to stop the resources, the stop
action has failed and the upgrading node was killed off by the second
node trying to cleanup the mess.  The resources should have come up
on the second node after that.

-- 
Valentin