<div dir="ltr"><div>Yep all my pcs commands run on a live cluster. The design needs resources to respond in specific ways before </div><div>moving on to other shutdown requests.</div><div><br></div>So it seems that these pcs commands that run on different nodes at the same time, is the route cause of this issue,<div>anything that changes the live cib at the same time seems to cause pacemaker to just skip\throw away actions that </div><div>that have been requested.</div><div><br></div><div>I have to admit this behaviour is very hard to work with. though in a simple system using a shadow cib would avoid these issues, </div><div>that would suggest a central point of control anyway.</div><div><br></div><div>Luckily I have/can redesigned my approach to bring all the commands that affect the live cib (on cluster shutdown\startup) to be run from a </div><div>single node within the cluster. (and added --waits to commands where possible)</div><div><br></div><div>This approach removes all these issues, and things behave as expected.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Nov 9, 2018 at 12:00 PM <<a href="mailto:users-request@clusterlabs.org">users-request@clusterlabs.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send Users mailing list submissions to<br>

        <a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:users-request@clusterlabs.org" target="_blank">users-request@clusterlabs.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:users-owner@clusterlabs.org" target="_blank">users-owner@clusterlabs.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of Users digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. Re: Pacemaker auto restarts disabled groups (Ian Underhill)<br>

   2. Re: Pacemaker auto restarts disabled groups (Ken Gaillot)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Thu, 8 Nov 2018 12:14:33 +0000<br>

From: Ian Underhill <<a href="mailto:ianpunderhill@gmail.com" target="_blank">ianpunderhill@gmail.com</a>><br>

To: <a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a><br>

Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups<br>

Message-ID:<br>

        <<a href="mailto:CAGu%2BcYgDdMThbV23%2B55eC40tjOGeYZzUKbL9o_ydjkqp%2BJOjjA@mail.gmail.com" target="_blank">CAGu+cYgDdMThbV23+55eC40tjOGeYZzUKbL9o_ydjkqp+JOjjA@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="utf-8"<br>

<br>

seems this issue has been raised before, but has gone quite, with no<br>

solution<br>

<br>

<a href="https://lists.clusterlabs.org/pipermail/users/2017-October/006544.html" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/pipermail/users/2017-October/006544.html</a><br>

<br>

I know my resource agents successfully return the correct status to the<br>

start\stop\monitor requests<br>

<br>

On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill <<a href="mailto:ianpunderhill@gmail.com" target="_blank">ianpunderhill@gmail.com</a>><br>

wrote:<br>

<br>

> Sometimes Im seeing that a resource group that is in the process of being<br>

> disable is auto restarted by pacemaker.<br>

><br>

> When issuing pcs disable command to disable different resource groups at<br>

> the same time (on different nodes, at the group level) the result is that<br>

> sometimes the resource is stopped and restarted straight away. i'm using a<br>

> balanced placement strategy.<br>

><br>

> looking into the daemon log, pacemaker is aborting transtions due to<br>

> config change of the meta attributes of target-role changing?<br>

><br>

> Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3,<br>

> Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): Stopped<br>

><br>

> could somebody explain Complete/Pending/Fired/Skipped/Incomplete and is<br>

> there a way of displaying Skipped actions?<br>

><br>

> ive used crm_simulate --xml-file XXXX -run to see the actions, and I see<br>

> this extra start request<br>

><br>

> regards<br>

><br>

> /Ian.<br>

><br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="https://lists.clusterlabs.org/pipermail/users/attachments/20181108/8e824615/attachment-0001.html" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/pipermail/users/attachments/20181108/8e824615/attachment-0001.html</a>><br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Thu, 08 Nov 2018 10:58:52 -0600<br>

From: Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>

To: Cluster Labs - All topics related to open-source clustering<br>

        welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>

Subject: Re: [ClusterLabs] Pacemaker auto restarts disabled groups<br>

Message-ID: <<a href="mailto:1541696332.5197.3.camel@redhat.com" target="_blank">1541696332.5197.3.camel@redhat.com</a>><br>

Content-Type: text/plain; charset="UTF-8"<br>

<br>

On Thu, 2018-11-08 at 12:14 +0000, Ian Underhill wrote:<br>

> seems this issue has been raised before, but has gone quite, with no<br>

> solution<br>

> <br>

> <a href="https://lists.clusterlabs.org/pipermail/users/2017-October/006544.htm" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/pipermail/users/2017-October/006544.htm</a><br>

> l<br>

<br>

In that case, something appeared to be explicitly re-enabling the<br>

disabled resources. You can search your logs for "target-role" to see<br>

whether that's happening.<br>

<br>

> I know my resource agents successfully return the correct status to<br>

> the start\stop\monitor requests<br>

> <br>

> On Thu, Nov 8, 2018 at 11:40 AM Ian Underhill <<a href="mailto:ianpunderhill@gmail.co" target="_blank">ianpunderhill@gmail.co</a><br>

> m> wrote:<br>

> > Sometimes Im seeing that a resource group that is in the process of<br>

> > being disable is auto restarted by pacemaker.?<br>

> > <br>

> > When issuing pcs disable command to disable different resource<br>

> > groups at the same time (on different nodes, at the group level)<br>

> > the result is that sometimes the resource is stopped and restarted<br>

> > straight away. i'm using a balanced placement strategy.<br>

<br>

The first thing that comes to mind is that if you're running pcs on the<br>

live cluster, it won't actually be at the same time, there will be a<br>

small amount of time between each disable. The cluster could well<br>

decide to rebalance and thus restart other resource groups that haven't<br>

yet been disabled.<br>

<br>

A way around that would be to run pcs on a file instead and push that<br>

to the live cluster:<br>

<br>

 pcs cluster cib whatever.xml<br>

 pcs -f whatever.xml ...whatever command you want...<br>

 ...<br>

 pcs cluster cib-push whatever.xml --config<br>

<br>

That would make all the disabling happen at the same time.<br>

<br>

> > <br>

> > looking into the daemon log, pacemaker is aborting transtions due<br>

> > to config change of the meta attributes of target-role changing?<br>

> > <br>

> > Transition 2838 (Complete=25, Pending=0, Fired=0, Skipped=3,<br>

> > Incomplete=10, Source=/var/lib/pacemaker/pengine/pe-input-704.bz2): <br>

> > Stopped<br>

> > <br>

> > could somebody explain Complete/Pending/Fired/Skipped/Incomplete<br>

> > and is there a way of displaying Skipped actions?<br>

<br>

It's almost never useful to end users, and barely more useful even to<br>

developers. If you pass -VVVVVV to crm_simulate, you could get more<br>

info, but trust me you don't want to do that. ;-)<br>

<br>

Each transition is a set of actions needed to get to the desired state.<br>

"Complete" are actions that were initiated and a result was received.<br>

"Pending" are actions that were initiated but the result hasn't come<br>

back yet. "Skipped" is for certain failure situations, and for when a<br>

transition is aborted and an action that would be scheduled is a lower<br>

priority than the abort (which is probably what happened here, nothing<br>

significant). "Incomplete" is for actions that haven't been initiated<br>

yet.<br>

<br>

<br>

> > ive used crm_simulate --xml-file XXXX -run to see the actions, and<br>

> > I see this extra start request<br>

> > <br>

> > regards<br>

> > <br>

> > /Ian.<br>

-- <br>

Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>><br>

<br>

<br>

------------------------------<br>

<br>

Subject: Digest Footer<br>

<br>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

<br>

------------------------------<br>

<br>

End of Users Digest, Vol 46, Issue 8<br>

************************************<br>

</blockquote></div>