[ClusterLabs] Misunderstanding or bug in crm_simulate output

Thu Jan 25 17:06:47 EST 2018

On Thu, 2018-01-25 at 17:45 +0100, Jehan-Guillaume de Rorthais wrote:
> On Wed, 24 Jan 2018 17:42:56 -0600
> Ken Gaillot <kgaillot at redhat.com> wrote:
> 
> > On Fri, 2018-01-19 at 00:37 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Thu, 18 Jan 2018 10:54:33 -0600
> > > Ken Gaillot <kgaillot at redhat.com> wrote:
> > >   
> > > > On Thu, 2018-01-18 at 16:15 +0100, Jehan-Guillaume de Rorthais
> > > > wrote:  
> > > > > Hi list,
> > > > > 
> > > > > I was explaining how to use crm_simulate to a colleague when
> > > > > he
> > > > > pointed to me a
> > > > > non expected and buggy output.
> > > > > 
> > > > > Here are some simple steps to reproduce:
> > > > > 
> > > > >   $ pcs cluster setup --name usecase srv1 srv2 srv3
> > > > >   $ pcs cluster start --all
> > > > >   $ pcs property set stonith-enabled=false
> > > > >   $ pcs resource create dummy1 ocf:heartbeat:Dummy \
> > > > >     state=/tmp/dummy1.state                        \
> > > > >     op monitor interval=10s                        \
> > > > >     meta migration-threshold=3 resource-stickiness=1
> > > > > 
> > > > > Now, we are injecting 2 monitor soft errors, triggering 2
> > > > > local
> > > > > recovery
> > > > > (stop/start):
> > > > > 
> > > > >   $ crm_simulate -S -L -i dummy1_monitor_10 at srv1=1 -O
> > > > > /tmp/step1.xml
> > > > >   $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10 at sr
> > > > > v1=1
> > > > >   -O /tmp/step2.xml
> > > > > 
> > > > > 
> > > > > So far so good. A third soft error on monitor push dummy1 out
> > > > > of
> > > > > srv1, this
> > > > > was expected. However, the final status of the cluster shows
> > > > > dummy1
> > > > > as
> > > > > started on both srv1 and srv2!
> > > > > 
> > > > >   $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10 at sr
> > > > > v1=1
> > > > >   -O /tmp/step3.xml
> > > > > 
> > > > >   Current cluster status:
> > > > >   Online: [ srv1 srv2 srv3 ]
> > > > > 
> > > > >    dummy1	(ocf::heartbeat:Dummy):	Started srv1
> > > > > 
> > > > >   Performing requested modifications
> > > > >    + Injecting dummy1_monitor_10 at srv1=1 into the
> > > > > configuration
> > > > >    + Injecting attribute fail-count-dummy1=value++ into
> > > > > /node_state
> > > > > '1'
> > > > >    + Injecting attribute last-failure-dummy1=1516287891 into
> > > > > /node_state '1'
> > > > > 
> > > > >   Transition Summary:
> > > > >    * Recover    dummy1     ( srv1 -> srv2 )  
> > > > > 
> > > > >   Executing cluster transition:
> > > > >    * Cluster action:  clear_failcount for dummy1 on srv1
> > > > >    * Resource action: dummy1          stop on srv1
> > > > >    * Resource action: dummy1          cancel=10 on srv1
> > > > >    * Pseudo action:   all_stopped
> > > > >    * Resource action: dummy1          start on srv2
> > > > >    * Resource action: dummy1          monitor=10000 on srv2
> > > > > 
> > > > >   Revised cluster status:
> > > > >   Online: [ srv1 srv2 srv3 ]
> > > > > 
> > > > >    dummy1	(ocf::heartbeat:Dummy):	Started[ srv1
> > > > > srv2 ]
> > > > > 
> > > > > I suppose this is a bug from crm_simulate? Why is it
> > > > > considering
> > > > > dummy1 is
> > > > > started on srv1 when the transition execution stopped it on
> > > > > srv1?    
> > > > 
> > > > It's definitely a bug, either in crm_simulate or the policy
> > > > engine
> > > > itself. Can you attach step2.xml?  
> > > 
> > > Sure, please, find in attachment step2.xml.  
> > 
> > I can reproduce the issue with 1.1.16 but not 1.1.17 or later, so
> > whatever it was, it got fixed.
> 
> Interesting. I did some quick search and was supposing the bug came
> from
> somewhere around "fake_transition.c". I hadn't time to dig real far
> though.
> Moreover, I am studying the code from master, not for 1.1.16, what a
> waste of
> time if it's fixed on the master branch :)
> 
> Too bad most stable distro are still using 1.1.16 (Debian, CentOS,
> Ubuntu).

Distros do backport some fixes, so you'd have to test each one to see
if it's affected.

> 
> I have another question related to my tests in regard with this
> subject. Keep
> in mind this is still with 1.1.16, but it might not be related to the
> version.
> 
> While testing on a live cluster, without crm_simulate, to see what
> really happen
> there, I found in the log file that the pengine was producing 2
> distincts
> transitions. Find the log file in attachment.
> 
> As far as I understand it, I suppose that:
> 
>   * the crmd on srv3/dc receive the failure result from srv1 for
> monitor on
>     dummy1
>   * crmd/srv3 forward the failcount++ to the attrd
>   * crmd/srv3 invoke the pengine, producing the transition 5
>   * attrd set the failcount for dummy1/srv1 to 3 (maximum allowed,
> dummy1 must
>     move)
>   * because of the failcount update, crmd/srv3 invoke a second time
> the pengine
>     which produce transition 6
>   * crmd set transition 5 as obsolete
> 
> Isn't it possible to wait for the attrd result before invoking
> pengine to
> provide it with an up to date cib as input? If I understand
> correctly, the
> attrd call is asynchronous, however, when the crmd/DC invoke it, I
> suppose such
> calls should be synchrone, isn't it?

It is still asynchronous; the crmd has to be able to respond to new
information coming in from anywhere at anytime.

It is a bit redundant, but it errs on the side of speedy response. It
doesn't know when (or even if) the cib will successfully write the
change from attrd, so the safest thing to do is start a transition
immediately. If the change goes through quickly, we'll have wasted a
little bit of CPU and I/O, but it doesn't cause problems. If it doesn't
go through quickly, we can get a head start on recovery actions that
are needed now.

> Shouldn't it be a little bit more simple, fast, less error prone,
> with fewer
> and easier to understand log entries? This last issue is quite
> important IMO as
> reading Pacemaker logfile is still quite a pain for many people...
> 
> Thanks,

The logs are definitely a perpetual area for improvement.
-- 
Ken Gaillot <kgaillot at redhat.com>