[ClusterLabs] Pacemaker auto restarts disabled groups

Ken Gaillot kgaillot at redhat.com
Thu Nov 8 11:58:52 EST 2018


On Thu, 2018-11-08 at 12:14 +0000, Ian Underhill wrote:
> seems this issue has been raised before, but has gone quite, with no
> solution
> 
> https://lists.clusterlabs.org/pipermail/users/2017-October/006544.htm
> l

In that case, something appeared to be explicitly re-enabling the
disabled resources. You can search your logs for "target-role" to see
whether that's happening.

> I know my resource agents successfully return the correct status to
> the start\stop\monitor requests
> 
> On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill <ianpunderhill at gmail.co
> m> wrote:
> > Sometimes Im seeing that a resource group that is in the process of
> > being disable is auto restarted by pacemaker. 
> > 
> > When issuing pcs disable command to disable different resource
> > groups at the same time (on different nodes, at the group level)
> > the result is that sometimes the resource is stopped and restarted
> > straight away. i'm using a balanced placement strategy.

The first thing that comes to mind is that if you're running pcs on the
live cluster, it won't actually be at the same time, there will be a
small amount of time between each disable. The cluster could well
decide to rebalance and thus restart other resource groups that haven't
yet been disabled.

A way around that would be to run pcs on a file instead and push that
to the live cluster:

 pcs cluster cib whatever.xml
 pcs -f whatever.xml ...whatever command you want...
 ...
 pcs cluster cib-push whatever.xml --config

That would make all the disabling happen at the same time.

> > 
> > looking into the daemon log, pacemaker is aborting transtions due
> > to config change of the meta attributes of target-role changing?
> > 
> > Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3,
> > Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): 
> > Stopped
> > 
> > could somebody explain Complete/Pending/Fired/Skipped/Incomplete
> > and is there a way of displaying Skipped actions?

It's almost never useful to end users, and barely more useful even to
developers. If you pass -VVVVVV to crm_simulate, you could get more
info, but trust me you don't want to do that. ;-)

Each transition is a set of actions needed to get to the desired state.
"Complete" are actions that were initiated and a result was received.
"Pending" are actions that were initiated but the result hasn't come
back yet. "Skipped" is for certain failure situations, and for when a
transition is aborted and an action that would be scheduled is a lower
priority than the abort (which is probably what happened here, nothing
significant). "Incomplete" is for actions that haven't been initiated
yet.


> > ive used crm_simulate --xml-file XXXX -run to see the actions, and
> > I see this extra start request
> > 
> > regards
> > 
> > /Ian.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list