[ClusterLabs] Misunderstanding or bug in crm_simulate output

Thu Jan 25 16:45:30 UTC 2018

On Wed, 24 Jan 2018 17:42:56 -0600
Ken Gaillot <kgaillot at redhat.com> wrote:

> On Fri, 2018-01-19 at 00:37 +0100, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 18 Jan 2018 10:54:33 -0600
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >   
> > > On Thu, 2018-01-18 at 16:15 +0100, Jehan-Guillaume de Rorthais
> > > wrote:  
> > > > Hi list,
> > > > 
> > > > I was explaining how to use crm_simulate to a colleague when he
> > > > pointed to me a
> > > > non expected and buggy output.
> > > > 
> > > > Here are some simple steps to reproduce:
> > > > 
> > > >   $ pcs cluster setup --name usecase srv1 srv2 srv3
> > > >   $ pcs cluster start --all
> > > >   $ pcs property set stonith-enabled=false
> > > >   $ pcs resource create dummy1 ocf:heartbeat:Dummy \
> > > >     state=/tmp/dummy1.state                        \
> > > >     op monitor interval=10s                        \
> > > >     meta migration-threshold=3 resource-stickiness=1
> > > > 
> > > > Now, we are injecting 2 monitor soft errors, triggering 2 local
> > > > recovery
> > > > (stop/start):
> > > > 
> > > >   $ crm_simulate -S -L -i dummy1_monitor_10 at srv1=1 -O
> > > > /tmp/step1.xml
> > > >   $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10 at srv1=1
> > > >   -O /tmp/step2.xml
> > > > 
> > > > 
> > > > So far so good. A third soft error on monitor push dummy1 out of
> > > > srv1, this
> > > > was expected. However, the final status of the cluster shows
> > > > dummy1
> > > > as
> > > > started on both srv1 and srv2!
> > > > 
> > > >   $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10 at srv1=1
> > > >   -O /tmp/step3.xml
> > > > 
> > > >   Current cluster status:
> > > >   Online: [ srv1 srv2 srv3 ]
> > > > 
> > > >    dummy1	(ocf::heartbeat:Dummy):	Started srv1
> > > > 
> > > >   Performing requested modifications
> > > >    + Injecting dummy1_monitor_10 at srv1=1 into the configuration
> > > >    + Injecting attribute fail-count-dummy1=value++ into
> > > > /node_state
> > > > '1'
> > > >    + Injecting attribute last-failure-dummy1=1516287891 into
> > > > /node_state '1'
> > > > 
> > > >   Transition Summary:
> > > >    * Recover    dummy1     ( srv1 -> srv2 )  
> > > > 
> > > >   Executing cluster transition:
> > > >    * Cluster action:  clear_failcount for dummy1 on srv1
> > > >    * Resource action: dummy1          stop on srv1
> > > >    * Resource action: dummy1          cancel=10 on srv1
> > > >    * Pseudo action:   all_stopped
> > > >    * Resource action: dummy1          start on srv2
> > > >    * Resource action: dummy1          monitor=10000 on srv2
> > > > 
> > > >   Revised cluster status:
> > > >   Online: [ srv1 srv2 srv3 ]
> > > > 
> > > >    dummy1	(ocf::heartbeat:Dummy):	Started[ srv1
> > > > srv2 ]
> > > > 
> > > > I suppose this is a bug from crm_simulate? Why is it considering
> > > > dummy1 is
> > > > started on srv1 when the transition execution stopped it on
> > > > srv1?    
> > > 
> > > It's definitely a bug, either in crm_simulate or the policy engine
> > > itself. Can you attach step2.xml?  
> > 
> > Sure, please, find in attachment step2.xml.  
> 
> I can reproduce the issue with 1.1.16 but not 1.1.17 or later, so
> whatever it was, it got fixed.

Interesting. I did some quick search and was supposing the bug came from
somewhere around "fake_transition.c". I hadn't time to dig real far though.
Moreover, I am studying the code from master, not for 1.1.16, what a waste of
time if it's fixed on the master branch :)

Too bad most stable distro are still using 1.1.16 (Debian, CentOS, Ubuntu).

I have another question related to my tests in regard with this subject. Keep
in mind this is still with 1.1.16, but it might not be related to the version.

While testing on a live cluster, without crm_simulate, to see what really happen
there, I found in the log file that the pengine was producing 2 distincts
transitions. Find the log file in attachment.

As far as I understand it, I suppose that:

  * the crmd on srv3/dc receive the failure result from srv1 for monitor on
    dummy1
  * crmd/srv3 forward the failcount++ to the attrd
  * crmd/srv3 invoke the pengine, producing the transition 5
  * attrd set the failcount for dummy1/srv1 to 3 (maximum allowed, dummy1 must
    move)
  * because of the failcount update, crmd/srv3 invoke a second time the pengine
    which produce transition 6
  * crmd set transition 5 as obsolete

Isn't it possible to wait for the attrd result before invoking pengine to
provide it with an up to date cib as input? If I understand correctly, the
attrd call is asynchronous, however, when the crmd/DC invoke it, I suppose such
calls should be synchrone, isn't it?

Shouldn't it be a little bit more simple, fast, less error prone, with fewer
and easier to understand log entries? This last issue is quite important IMO as
reading Pacemaker logfile is still quite a pain for many people...

Thanks,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.log
Type: text/x-log
Size: 14916 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20180125/88ba8d16/attachment-0002.bin>