[ClusterLabs] crm_resource --wait

Wed Oct 11 04:01:29 UTC 2017

I've attached two files:
314 = after standby step
315 = after resource update

On Wed, Oct 11, 2017 at 12:22 AM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On Tue, 2017-10-10 at 15:19 +1000, Leon Steffens wrote:
> > Hi Ken,
> >
> > I managed to reproduce this on a simplified version of the cluster,
> > and on Pacemaker 1.1.15, 1.1.16, as well as 1.1.18-rc1
>
> > The steps to create the cluster are:
> >
> > pcs property set stonith-enabled=false
> > pcs property set placement-strategy=balanced
> >
> > pcs node utilization vm1 cpu=100
> > pcs node utilization vm2 cpu=100
> > pcs node utilization vm3 cpu=100
> >
> > pcs property set maintenance-mode=true
> >
> > pcs resource create sv-fencer ocf:pacemaker:Dummy
> >
> > pcs resource create sv ocf:pacemaker:Dummy clone notify=false
> > pcs resource create std ocf:pacemaker:Dummy meta resource-
> > stickiness=100
> >
> > pcs resource create partition1 ocf:pacemaker:Dummy meta resource-
> > stickiness=100
> > pcs resource create partition2 ocf:pacemaker:Dummy meta resource-
> > stickiness=100
> > pcs resource create partition3 ocf:pacemaker:Dummy meta resource-
> > stickiness=100
> >
> > pcs resource utilization partition1 cpu=5
> > pcs resource utilization partition2 cpu=5
> > pcs resource utilization partition3 cpu=5
> >
> > pcs constraint colocation add std with sv-clone INFINITY
> > pcs constraint colocation add partition1 with sv-clone INFINITY
> > pcs constraint colocation add partition2 with sv-clone INFINITY
> > pcs constraint colocation add partition3 with sv-clone INFINITY
> >
> > pcs property set maintenance-mode=false
> >
> >
> > I can then reproduce the issues in the following way:
> >
> > $ pcs resource
> >  sv-fencer      (ocf::pacemaker:Dummy): Started vm1
> >  Clone Set: sv-clone [sv]
> >      Started: [ vm1 vm2 vm3 ]
> >  std    (ocf::pacemaker:Dummy): Started vm2
> >  partition1     (ocf::pacemaker:Dummy): Started vm3
> >  partition2     (ocf::pacemaker:Dummy): Started vm1
> >  partition3     (ocf::pacemaker:Dummy): Started vm2
> >
> > $ pcs cluster standby vm3
> >
> > # Check that all resources have moved off vm3
> > $ pcs resource
> >  sv-fencer      (ocf::pacemaker:Dummy): Started vm1
> >  Clone Set: sv-clone [sv]
> >      Started: [ vm1 vm2 ]
> >      Stopped: [ vm3 ]
> >  std    (ocf::pacemaker:Dummy): Started vm2
> >  partition1     (ocf::pacemaker:Dummy): Started vm1
> >  partition2     (ocf::pacemaker:Dummy): Started vm1
> >  partition3     (ocf::pacemaker:Dummy): Started vm2
>
> Thanks for the detailed information, this should help me get to the
> bottom of it. From this description, it sounds like a new transition
> isn't being triggered when it should.
>
> Could you please attach the DC's pe-input file that is listed in the
> logs after the standby step above? That would simplify analysis.
>
> > # Wait for any outstanding actions to complete.
> > $ crm_resource --wait --timeout 300
> > Pending actions:
> >         Action 22: sv-fencer_monitor_10000      on vm2
> >         Action 21: sv-fencer_start_0    on vm2
> >         Action 20: sv-fencer_stop_0     on vm1
> > Error performing operation: Timer expired
> >
> > # Check the resources again - sv-fencer is still on vm1
> > $ pcs resource
> >  sv-fencer      (ocf::pacemaker:Dummy): Started vm1
> >  Clone Set: sv-clone [sv]
> >      Started: [ vm1 vm2 ]
> >      Stopped: [ vm3 ]
> >  std    (ocf::pacemaker:Dummy): Started vm2
> >  partition1     (ocf::pacemaker:Dummy): Started vm1
> >  partition2     (ocf::pacemaker:Dummy): Started vm1
> >  partition3     (ocf::pacemaker:Dummy): Started vm2
> >
> > # Perform a random update to the CIB.
> > $ pcs resource update std op monitor interval=20 timeout=20
> >
> > # Check resource status again - sv_fencer has now moved to vm2 (the
> > action crm_resource was waiting for)
> > $ pcs resource
> >  sv-fencer      (ocf::pacemaker:Dummy): Started vm2  <<<============
> >  Clone Set: sv-clone [sv]
> >      Started: [ vm1 vm2 ]
> >      Stopped: [ vm3 ]
> >  std    (ocf::pacemaker:Dummy): Started vm2
> >  partition1     (ocf::pacemaker:Dummy): Started vm1
> >  partition2     (ocf::pacemaker:Dummy): Started vm1
> >  partition3     (ocf::pacemaker:Dummy): Started vm2
> >
> > I do not get the problem if I:
> > 1) remove the "std" resource; or
> > 2) remove the co-location constraints; or
> > 3) remove the utilization attributes for the partition resources.
> >
> > In these cases the sv-fencer resource is happy to stay on vm1, and
> > crm_resource --wait returns immediately.
> >
> > It looks like the pcs cluster standby call is creating/registering
> > the actions to move the sv-fencer resource to vm2, but it doesn't
> > include it in the cluster transition.  When the CIB is later updated
> > by something else, the action is included in that transition.
> >
> >
> > Regards,
> > Leon
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171011/a4a6a4fb/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-input-314.bz2
Type: application/x-bzip2
Size: 2337 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171011/a4a6a4fb/attachment-0004.bz2>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pe-input-315.bz2
Type: application/x-bzip2
Size: 2413 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171011/a4a6a4fb/attachment-0005.bz2>