[ClusterLabs] Users Digest, Vol 46, Issue 8

Fri Nov 9 08:24:09 EST 2018

Yep all my pcs commands run on a live cluster. The design needs resources
to respond in specific ways before
moving on to other shutdown requests.

So it seems that these pcs commands that run on different nodes at the same
time, is the route cause of this issue,
anything that changes the live cib at the same time seems to cause
pacemaker to just skip\throw away actions that
that have been requested.

I have to admit this behaviour is very hard to work with. though in a
simple system using a shadow cib would avoid these issues,
that would suggest a central point of control anyway.

Luckily I have/can redesigned my approach to bring all the commands that
affect the live cib (on cluster shutdown\startup) to be run from a
single node within the cluster. (and added --waits to commands where
possible)

This approach removes all these issues, and things behave as expected.

On Fri, Nov 9, 2018 at 12:00 PM <users-request at clusterlabs.org> wrote:

> Send Users mailing list submissions to
>         users at clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
>         users-request at clusterlabs.org
>
> You can reach the person managing the list at
>         users-owner at clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
>
>
> Today's Topics:
>
>    1. Re: Pacemaker auto restarts disabled groups (Ian Underhill)
>    2. Re: Pacemaker auto restarts disabled groups (Ken Gaillot)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 8 Nov 2018 12:14:33 +0000
> From: Ian Underhill <ianpunderhill at gmail.com>
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups
> Message-ID:
>         <
> CAGu+cYgDdMThbV23+55eC40tjOGeYZzUKbL9o_ydjkqp+JOjjA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> seems this issue has been raised before, but has gone quite, with no
> solution
>
> https://lists.clusterlabs.org/pipermail/users/2017-October/006544.html
>
> I know my resource agents successfully return the correct status to the
> start\stop\monitor requests
>
> On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill <ianpunderhill at gmail.com>
> wrote:
>
> > Sometimes Im seeing that a resource group that is in the process of being
> > disable is auto restarted by pacemaker.
> >
> > When issuing pcs disable command to disable different resource groups at
> > the same time (on different nodes, at the group level) the result is that
> > sometimes the resource is stopped and restarted straight away. i'm using
> a
> > balanced placement strategy.
> >
> > looking into the daemon log, pacemaker is aborting transtions due to
> > config change of the meta attributes of target-role changing?
> >
> > Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3,
> > Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2):
> Stopped
> >
> > could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is
> > there a way of displaying Skipped actions?
> >
> > ive used crm_simulate --xml-file XXXX -run to see the actions, and I see
> > this extra start request
> >
> > regards
> >
> > /Ian.
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.clusterlabs.org/pipermail/users/attachments/20181108/8e824615/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 08 Nov 2018 10:58:52 -0600
> From: Ken Gaillot <kgaillot at redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
>         welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups
> Message-ID: <1541696332.5197.3.camel at redhat.com>
> Content-Type: text/plain; charset="UTF-8"
>
> On Thu, 2018-11-08 at 12:14 +0000, Ian Underhill wrote:
> > seems this issue has been raised before, but has gone quite, with no
> > solution
> >
> > https://lists.clusterlabs.org/pipermail/users/2017-October/006544.htm
> > l
>
> In that case, something appeared to be explicitly re-enabling the
> disabled resources. You can search your logs for "target-role" to see
> whether that's happening.
>
> > I know my resource agents successfully return the correct status to
> > the start\stop\monitor requests
> >
> > On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill <ianpunderhill at gmail.co
> > m> wrote:
> > > Sometimes Im seeing that a resource group that is in the process of
> > > being disable is auto restarted by pacemaker.?
> > >
> > > When issuing pcs disable command to disable different resource
> > > groups at the same time (on different nodes, at the group level)
> > > the result is that sometimes the resource is stopped and restarted
> > > straight away. i'm using a balanced placement strategy.
>
> The first thing that comes to mind is that if you're running pcs on the
> live cluster, it won't actually be at the same time, there will be a
> small amount of time between each disable. The cluster could well
> decide to rebalance and thus restart other resource groups that haven't
> yet been disabled.
>
> A way around that would be to run pcs on a file instead and push that
> to the live cluster:
>
>  pcs cluster cib whatever.xml
>  pcs -f whatever.xml ...whatever command you want...
>  ...
>  pcs cluster cib-push whatever.xml --config
>
> That would make all the disabling happen at the same time.
>
> > >
> > > looking into the daemon log, pacemaker is aborting transtions due
> > > to config change of the meta attributes of target-role changing?
> > >
> > > Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3,
> > > Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2):
> > > Stopped
> > >
> > > could somebody explain Complete/Pending/Fired/Skipped/Incomplete
> > > and is there a way of displaying Skipped actions?
>
> It's almost never useful to end users, and barely more useful even to
> developers. If you pass -VVVVVV to crm_simulate, you could get more
> info, but trust me you don't want to do that. ;-)
>
> Each transition is a set of actions needed to get to the desired state.
> "Complete" are actions that were initiated and a result was received.
> "Pending" are actions that were initiated but the result hasn't come
> back yet. "Skipped" is for certain failure situations, and for when a
> transition is aborted and an action that would be scheduled is a lower
> priority than the abort (which is probably what happened here, nothing
> significant). "Incomplete" is for actions that haven't been initiated
> yet.
>
>
> > > ive used crm_simulate --xml-file XXXX -run to see the actions, and
> > > I see this extra start request
> > >
> > > regards
> > >
> > > /Ian.
> --
> Ken Gaillot <kgaillot at redhat.com>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Users mailing list
> Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
>
> ------------------------------
>
> End of Users Digest, Vol 46, Issue 8
> ************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20181109/2e4d12e2/attachment.html>