[ClusterLabs] crm_resource --wait
Ken Gaillot
kgaillot at redhat.com
Fri Oct 20 17:05:11 EDT 2017
I've narrowed down the cause.
When the "standby" transition completes, vm2 has more remaining
utilization capacity than vm1, so the cluster wants to run sv-fencer
there. That should be taken into account in the same transition, but it
isn't, so a second transition is needed to make it happen.
Still investigating a fix. A workaround is to assign some stickiness or
utilization to sv-fencer.
On Wed, 2017-10-11 at 14:01 +1000, Leon Steffens wrote:
> I've attached two files:
> 314 = after standby step
> 315 = after resource update
>
> On Wed, Oct 11, 2017 at 12:22 AM, Ken Gaillot <kgaillot at redhat.com>
> wrote:
> > On Tue, 2017-10-10 at 15:19 +1000, Leon Steffens wrote:
> > > Hi Ken,
> > >
> > > I managed to reproduce this on a simplified version of the
> > cluster,
> > > and on Pacemaker 1.1.15, 1.1.16, as well as 1.1.18-rc1
> >
> > > The steps to create the cluster are:
> > >
> > > pcs property set stonith-enabled=false
> > > pcs property set placement-strategy=balanced
> > >
> > > pcs node utilization vm1 cpu=100
> > > pcs node utilization vm2 cpu=100
> > > pcs node utilization vm3 cpu=100
> > >
> > > pcs property set maintenance-mode=true
> > >
> > > pcs resource create sv-fencer ocf:pacemaker:Dummy
> > >
> > > pcs resource create sv ocf:pacemaker:Dummy clone notify=false
> > > pcs resource create std ocf:pacemaker:Dummy meta resource-
> > > stickiness=100
> > >
> > > pcs resource create partition1 ocf:pacemaker:Dummy meta resource-
> > > stickiness=100
> > > pcs resource create partition2 ocf:pacemaker:Dummy meta resource-
> > > stickiness=100
> > > pcs resource create partition3 ocf:pacemaker:Dummy meta resource-
> > > stickiness=100
> > >
> > > pcs resource utilization partition1 cpu=5
> > > pcs resource utilization partition2 cpu=5
> > > pcs resource utilization partition3 cpu=5
> > >
> > > pcs constraint colocation add std with sv-clone INFINITY
> > > pcs constraint colocation add partition1 with sv-clone INFINITY
> > > pcs constraint colocation add partition2 with sv-clone INFINITY
> > > pcs constraint colocation add partition3 with sv-clone INFINITY
> > >
> > > pcs property set maintenance-mode=false
> > >
> > >
> > > I can then reproduce the issues in the following way:
> > >
> > > $ pcs resource
> > > sv-fencer (ocf::pacemaker:Dummy): Started vm1
> > > Clone Set: sv-clone [sv]
> > > Started: [ vm1 vm2 vm3 ]
> > > std (ocf::pacemaker:Dummy): Started vm2
> > > partition1 (ocf::pacemaker:Dummy): Started vm3
> > > partition2 (ocf::pacemaker:Dummy): Started vm1
> > > partition3 (ocf::pacemaker:Dummy): Started vm2
> > >
> > > $ pcs cluster standby vm3
> > >
> > > # Check that all resources have moved off vm3
> > > $ pcs resource
> > > sv-fencer (ocf::pacemaker:Dummy): Started vm1
> > > Clone Set: sv-clone [sv]
> > > Started: [ vm1 vm2 ]
> > > Stopped: [ vm3 ]
> > > std (ocf::pacemaker:Dummy): Started vm2
> > > partition1 (ocf::pacemaker:Dummy): Started vm1
> > > partition2 (ocf::pacemaker:Dummy): Started vm1
> > > partition3 (ocf::pacemaker:Dummy): Started vm2
> >
> > Thanks for the detailed information, this should help me get to the
> > bottom of it. From this description, it sounds like a new
> > transition
> > isn't being triggered when it should.
> >
> > Could you please attach the DC's pe-input file that is listed in
> > the
> > logs after the standby step above? That would simplify analysis.
> >
> > > # Wait for any outstanding actions to complete.
> > > $ crm_resource --wait --timeout 300
> > > Pending actions:
> > > Action 22: sv-fencer_monitor_10000 on vm2
> > > Action 21: sv-fencer_start_0 on vm2
> > > Action 20: sv-fencer_stop_0 on vm1
> > > Error performing operation: Timer expired
> > >
> > > # Check the resources again - sv-fencer is still on vm1
> > > $ pcs resource
> > > sv-fencer (ocf::pacemaker:Dummy): Started vm1
> > > Clone Set: sv-clone [sv]
> > > Started: [ vm1 vm2 ]
> > > Stopped: [ vm3 ]
> > > std (ocf::pacemaker:Dummy): Started vm2
> > > partition1 (ocf::pacemaker:Dummy): Started vm1
> > > partition2 (ocf::pacemaker:Dummy): Started vm1
> > > partition3 (ocf::pacemaker:Dummy): Started vm2
> > >
> > > # Perform a random update to the CIB.
> > > $ pcs resource update std op monitor interval=20 timeout=20
> > >
> > > # Check resource status again - sv_fencer has now moved to vm2
> > (the
> > > action crm_resource was waiting for)
> > > $ pcs resource
> > > sv-fencer (ocf::pacemaker:Dummy): Started vm2
> > <<<============
> > > Clone Set: sv-clone [sv]
> > > Started: [ vm1 vm2 ]
> > > Stopped: [ vm3 ]
> > > std (ocf::pacemaker:Dummy): Started vm2
> > > partition1 (ocf::pacemaker:Dummy): Started vm1
> > > partition2 (ocf::pacemaker:Dummy): Started vm1
> > > partition3 (ocf::pacemaker:Dummy): Started vm2
> > >
> > > I do not get the problem if I:
> > > 1) remove the "std" resource; or
> > > 2) remove the co-location constraints; or
> > > 3) remove the utilization attributes for the partition resources.
> > >
> > > In these cases the sv-fencer resource is happy to stay on vm1,
> > and
> > > crm_resource --wait returns immediately.
> > >
> > > It looks like the pcs cluster standby call is
> > creating/registering
> > > the actions to move the sv-fencer resource to vm2, but it doesn't
> > > include it in the cluster transition. When the CIB is later
> > updated
> > > by something else, the action is included in that transition.
> > >
> > >
> > > Regards,
> > > Leon
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
> > h.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list