[ClusterLabs] All clones are stopped when one of them fails
Pavel Levshin
lpk at 581.spb.su
Tue Dec 8 01:40:15 EST 2020
Hello.
Despite many years of Pacemaker use, it never stops fooling me...
This time, I have faced a trivial problem. In my new setup, the cluster
consists of several identical nodes. A clone resource (vg.sanlock) is
started on every node, ensuring it has access to SAN storage. Almost all
other resources are colocated and ordered after vg.sanlock.
This day, I've started a node, and vg.sanlock has failed to start. Then
the cluster has desided to stop all the clone instances "due to node
availability", taking down all other resources by dependencies. This
seemes illogical to me. In the case of a failing clone, I would prefer
to see it stopping on one node only. How do I do it properly?
I've tried this config with Pacemaker 2.0.3 and 1.1.16, the behaviour
stays the same.
Reduced test config here:
pcs cluster auth test-pcmk0 test-pcmk1 <>/dev/tty
pcs cluster setup --name test-pcmk test-pcmk0 test-pcmk1 --transport udpu \
--auto_tie_breaker 1
pcs cluster start --all --wait=60
pcs cluster cib tmp-cib.xml
cp tmp-cib.xml tmp-cib.xml.deltasrc
pcs -f tmp-cib.xml property set stonith-enabled=false
pcs -f tmp-cib.xml resource defaults resource-stickiness=100
pcs -f tmp-cib.xml resource create vg.sanlock ocf:pacemaker:Dummy \
op monitor interval=10 timeout=20 start interval=0s stop interval=0s \
timeout=20
pcs -f tmp-cib.xml resource clone vg.sanlock interleave=true
pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc
And here goes cluster reaction to the failure:
# crm_simulate -x state4.xml -S
Current cluster status:
Online: [ test-pcmk0 test-pcmk1 ]
Clone Set: vg.sanlock-clone [vg.sanlock]
vg.sanlock (ocf::pacemaker:Dummy): FAILED test-pcmk0
Started: [ test-pcmk1 ]
Transition Summary:
* Stop vg.sanlock:0 ( test-pcmk1 ) due to node availability
* Stop vg.sanlock:1 ( test-pcmk0 ) due to node availability
Executing cluster transition:
* Pseudo action: vg.sanlock-clone_stop_0
* Resource action: vg.sanlock stop on test-pcmk1
* Resource action: vg.sanlock stop on test-pcmk0
* Pseudo action: vg.sanlock-clone_stopped_0
* Pseudo action: all_stopped
Revised cluster status:
Online: [ test-pcmk0 test-pcmk1 ]
Clone Set: vg.sanlock-clone [vg.sanlock]
Stopped: [ test-pcmk0 test-pcmk1 ]
As a sidenote, if I make those clones globally-unique, they seem to
behave properly. But nowhere I found a reference to this solution. In
general, globally-unique clones are referred to only where resource
agents make distinction between clone instances. This is not the case.
--
Thanks,
Pavel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201208/b9dbcb0f/attachment.htm>
More information about the Users
mailing list