[ClusterLabs] Misunderstanding or bug in crm_simulate output
Jehan-Guillaume de Rorthais
jgdr at dalibo.com
Thu Jan 18 18:37:34 EST 2018
On Thu, 18 Jan 2018 10:54:33 -0600
Ken Gaillot <kgaillot at redhat.com> wrote:
> On Thu, 2018-01-18 at 16:15 +0100, Jehan-Guillaume de Rorthais wrote:
> > Hi list,
> >
> > I was explaining how to use crm_simulate to a colleague when he
> > pointed to me a
> > non expected and buggy output.
> >
> > Here are some simple steps to reproduce:
> >
> > $ pcs cluster setup --name usecase srv1 srv2 srv3
> > $ pcs cluster start --all
> > $ pcs property set stonith-enabled=false
> > $ pcs resource create dummy1 ocf:heartbeat:Dummy \
> > state=/tmp/dummy1.state \
> > op monitor interval=10s \
> > meta migration-threshold=3 resource-stickiness=1
> >
> > Now, we are injecting 2 monitor soft errors, triggering 2 local
> > recovery
> > (stop/start):
> >
> > $ crm_simulate -S -L -i dummy1_monitor_10 at srv1=1 -O /tmp/step1.xml
> > $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10 at srv1=1
> > -O /tmp/step2.xml
> >
> >
> > So far so good. A third soft error on monitor push dummy1 out of
> > srv1, this
> > was expected. However, the final status of the cluster shows dummy1
> > as
> > started on both srv1 and srv2!
> >
> > $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10 at srv1=1
> > -O /tmp/step3.xml
> >
> > Current cluster status:
> > Online: [ srv1 srv2 srv3 ]
> >
> > dummy1 (ocf::heartbeat:Dummy): Started srv1
> >
> > Performing requested modifications
> > + Injecting dummy1_monitor_10 at srv1=1 into the configuration
> > + Injecting attribute fail-count-dummy1=value++ into /node_state
> > '1'
> > + Injecting attribute last-failure-dummy1=1516287891 into
> > /node_state '1'
> >
> > Transition Summary:
> > * Recover dummy1 ( srv1 -> srv2 )
> >
> > Executing cluster transition:
> > * Cluster action: clear_failcount for dummy1 on srv1
> > * Resource action: dummy1 stop on srv1
> > * Resource action: dummy1 cancel=10 on srv1
> > * Pseudo action: all_stopped
> > * Resource action: dummy1 start on srv2
> > * Resource action: dummy1 monitor=10000 on srv2
> >
> > Revised cluster status:
> > Online: [ srv1 srv2 srv3 ]
> >
> > dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ]
> >
> > I suppose this is a bug from crm_simulate? Why is it considering
> > dummy1 is
> > started on srv1 when the transition execution stopped it on srv1?
>
> It's definitely a bug, either in crm_simulate or the policy engine
> itself. Can you attach step2.xml?
Sure, please, find in attachment step2.xml.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: step2.xml
Type: application/xml
Size: 6800 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180119/646a3da5/attachment-0003.wsdl>
More information about the Users
mailing list