[ClusterLabs] Misunderstanding or bug in crm_simulate output
Jehan-Guillaume de Rorthais
jgdr at dalibo.com
Thu Jan 18 10:15:38 EST 2018
Hi list,
I was explaining how to use crm_simulate to a colleague when he pointed to me a
non expected and buggy output.
Here are some simple steps to reproduce:
$ pcs cluster setup --name usecase srv1 srv2 srv3
$ pcs cluster start --all
$ pcs property set stonith-enabled=false
$ pcs resource create dummy1 ocf:heartbeat:Dummy \
state=/tmp/dummy1.state \
op monitor interval=10s \
meta migration-threshold=3 resource-stickiness=1
Now, we are injecting 2 monitor soft errors, triggering 2 local recovery
(stop/start):
$ crm_simulate -S -L -i dummy1_monitor_10 at srv1=1 -O /tmp/step1.xml
$ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10 at srv1=1
-O /tmp/step2.xml
So far so good. A third soft error on monitor push dummy1 out of srv1, this
was expected. However, the final status of the cluster shows dummy1 as
started on both srv1 and srv2!
$ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10 at srv1=1
-O /tmp/step3.xml
Current cluster status:
Online: [ srv1 srv2 srv3 ]
dummy1 (ocf::heartbeat:Dummy): Started srv1
Performing requested modifications
+ Injecting dummy1_monitor_10 at srv1=1 into the configuration
+ Injecting attribute fail-count-dummy1=value++ into /node_state '1'
+ Injecting attribute last-failure-dummy1=1516287891 into /node_state '1'
Transition Summary:
* Recover dummy1 ( srv1 -> srv2 )
Executing cluster transition:
* Cluster action: clear_failcount for dummy1 on srv1
* Resource action: dummy1 stop on srv1
* Resource action: dummy1 cancel=10 on srv1
* Pseudo action: all_stopped
* Resource action: dummy1 start on srv2
* Resource action: dummy1 monitor=10000 on srv2
Revised cluster status:
Online: [ srv1 srv2 srv3 ]
dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ]
I suppose this is a bug from crm_simulate? Why is it considering dummy1 is
started on srv1 when the transition execution stopped it on srv1?
Taking the step3.xml output of this weird result force the cluster to stop
dummy1 everywhere and start it on srv2 only:
$ crm_simulate -S -x /tmp/step3.xml
Current cluster status:
Online: [ srv1 srv2 srv3 ]
dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ]
Transition Summary:
* Move dummy1 ( srv1 -> srv2 )
Executing cluster transition:
* Resource action: dummy1 stop on srv2
* Resource action: dummy1 stop on srv1
* Pseudo action: all_stopped
* Resource action: dummy1 start on srv2
* Resource action: dummy1 monitor=10000 on srv2
Revised cluster status:
Online: [ srv1 srv2 srv3 ]
dummy1 (ocf::heartbeat:Dummy): Started srv2
Thoughts?
More information about the Users
mailing list