[ClusterLabs] crm_resource --wait
Leon Steffens
leon at steffensonline.com
Tue Oct 10 01:19:58 EDT 2017
Hi Ken,
I managed to reproduce this on a simplified version of the cluster, and on
Pacemaker 1.1.15, 1.1.16, as well as 1.1.18-rc1
The steps to create the cluster are:
pcs property set stonith-enabled=false
pcs property set placement-strategy=balanced
pcs node utilization vm1 cpu=100
pcs node utilization vm2 cpu=100
pcs node utilization vm3 cpu=100
pcs property set maintenance-mode=true
pcs resource create sv-fencer ocf:pacemaker:Dummy
pcs resource create sv ocf:pacemaker:Dummy clone notify=false
pcs resource create std ocf:pacemaker:Dummy meta resource-stickiness=100
pcs resource create partition1 ocf:pacemaker:Dummy meta
resource-stickiness=100
pcs resource create partition2 ocf:pacemaker:Dummy meta
resource-stickiness=100
pcs resource create partition3 ocf:pacemaker:Dummy meta
resource-stickiness=100
pcs resource utilization partition1 cpu=5
pcs resource utilization partition2 cpu=5
pcs resource utilization partition3 cpu=5
pcs constraint colocation add std with sv-clone INFINITY
pcs constraint colocation add partition1 with sv-clone INFINITY
pcs constraint colocation add partition2 with sv-clone INFINITY
pcs constraint colocation add partition3 with sv-clone INFINITY
pcs property set maintenance-mode=false
I can then reproduce the issues in the following way:
$ pcs resource
sv-fencer (ocf::pacemaker:Dummy): Started vm1
Clone Set: sv-clone [sv]
Started: [ vm1 vm2 vm3 ]
std (ocf::pacemaker:Dummy): Started vm2
partition1 (ocf::pacemaker:Dummy): Started vm3
partition2 (ocf::pacemaker:Dummy): Started vm1
partition3 (ocf::pacemaker:Dummy): Started vm2
$ pcs cluster standby vm3
# Check that all resources have moved off vm3
$ pcs resource
sv-fencer (ocf::pacemaker:Dummy): Started vm1
Clone Set: sv-clone [sv]
Started: [ vm1 vm2 ]
Stopped: [ vm3 ]
std (ocf::pacemaker:Dummy): Started vm2
partition1 (ocf::pacemaker:Dummy): Started vm1
partition2 (ocf::pacemaker:Dummy): Started vm1
partition3 (ocf::pacemaker:Dummy): Started vm2
# Wait for any outstanding actions to complete.
$ crm_resource --wait --timeout 300
Pending actions:
Action 22: sv-fencer_monitor_10000 on vm2
Action 21: sv-fencer_start_0 on vm2
Action 20: sv-fencer_stop_0 on vm1
Error performing operation: Timer expired
# Check the resources again - sv-fencer is still on vm1
$ pcs resource
sv-fencer (ocf::pacemaker:Dummy): Started vm1
Clone Set: sv-clone [sv]
Started: [ vm1 vm2 ]
Stopped: [ vm3 ]
std (ocf::pacemaker:Dummy): Started vm2
partition1 (ocf::pacemaker:Dummy): Started vm1
partition2 (ocf::pacemaker:Dummy): Started vm1
partition3 (ocf::pacemaker:Dummy): Started vm2
# Perform a random update to the CIB.
$ pcs resource update std op monitor interval=20 timeout=20
# Check resource status again - sv_fencer has now moved to vm2 (the action
crm_resource was waiting for)
$ pcs resource
sv-fencer (ocf::pacemaker:Dummy): Started vm2 <<<============
Clone Set: sv-clone [sv]
Started: [ vm1 vm2 ]
Stopped: [ vm3 ]
std (ocf::pacemaker:Dummy): Started vm2
partition1 (ocf::pacemaker:Dummy): Started vm1
partition2 (ocf::pacemaker:Dummy): Started vm1
partition3 (ocf::pacemaker:Dummy): Started vm2
I do not get the problem if I:
1) remove the "std" resource; or
2) remove the co-location constraints; or
3) remove the utilization attributes for the partition resources.
In these cases the sv-fencer resource is happy to stay on vm1, and
crm_resource --wait returns immediately.
It looks like the pcs cluster standby call is creating/registering the
actions to move the sv-fencer resource to vm2, but it doesn't include it in
the cluster transition. When the CIB is later updated by something else,
the action is included in that transition.
Regards,
Leon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20171010/9189ac55/attachment-0003.html>
More information about the Users
mailing list