[ClusterLabs] Detected action arrived really late

Mon Apr 9 17:32:57 EDT 2018

Hi all,

Two weeks ago, I faced an issue in a cluster using the PAF resource agent
(multi-state RA for PostgreSQL) where CRMd was receiving results from
actions executed in the previous transition.

This behavior triggered an unexpected result during a simple test:

  * migration threshold: 5
  * pgsqld failcount: 0 
  * kill pgsqld master
  * expected result: recovery of master pgsqld on its current node

Final result: 
  * migration to another node
  * 15 results from notify actions in previous transition were receive "really
    late" (sometime 2 transitions later!)
  * 11 transitions has been computed: 
    * 1 because of the failcount setting asynchronously
    * 8 because of the "really late" events
    * 1 because PAF was lost and triggered an election instead of recovering
      the master
  * 15 seconds from error detection to last action of the last transition

Please, find attached the log files of this event. The following command is a
good sum up of this situation:

  grep -nE ' late|Calculated' pacemaker.late.log

This comes from a Pacemaker 1.1.13 on SLES 12 SP1. Servers are VM managed by
vmWare 5.5.

I suspect I have a similar scenario on the following issue using 1.1.16:
https://github.com/ClusterLabs/PAF/issues/131

In this issue, the attribute set during notify action arrive so lately, PAF
only find part of them and consider it as a failure. 

I looked at the changelog and I couldn't find any messages related to such
behaviors :/

My questions:

  * what trigger such behavior?
  * is it possible to avoid them with some better settings or hardware?
  * when setting a private attribute, PAF is checking continuously for
    attrd to reply the same value before going on with other actions. I hoped
    that would be enough to consider the whole cluster share the same value for
    the attribute, but it doesn't seems enough (see github issue #131). Is it
    possible to wait until the whole cluster nodes have the same value
    propagated everywhere?

Thanks in advance for your help!

Bonus: Should it help, here is my analysis of the logfile. Please, crop it while
answering if not needed!

L1-3
  LRMd/CRMd failure detection on slesHAtest1
L.38-42
  pengine plans a Recover of pgsqld:0 on slesHAtest1
L.43,45,46,50
  CRMd starts the transition execution and launch the pre notify actions
L.51-60
  CIB update the failcount (count + timestamp) to the CIB
L.66-67
  CRMd abort the current transition because of the failcount change
L.102-106
  pengine produce the same transition
L.107-109,112 and L.119,125
  CRMd starts the transition execution and launch the pre notify actions and
  receive the results
L.126
  CRMd starts the demote action (and cancel the monitor on pgsqld first)
L.137-138
  CRMd abort the transition because it receives the results of the
  notify action of the previous transition
L.205-209
  Because the demote already occurred, pengine plans a slave recover+promote on
  slesHAtest1
L.210-229
  CRMd starts the transition, do the notifications, do the stop
L.234-235
  CRMd abort the transition because it receives the results of the
  notify action of the previous transition
L.298-302
  Because the stop occurred, pengine plans to start pgsqld on slesHAtest1 and
  promote it on slesHAtest2
L.306,312-313
  CRMd do the notify and start action
L.318-319
  CRMd abort the transition because it receives the results of the
  notify action of the previous transition
L.336-337
  Because the start occurred, the post notify actions are triggered
L381-385
  Because the previous transition was aborted, pengine change its mind and
  plans to promote pgsqld on slesHAtest1
L.401-402
  CRMd abort the transition because it receives the results of the
  notify action of the previous transition
L.436-440 
  pengine plans the same transition (promote on slesHAtest1)
L.448-449
  CRMd abort the transition because it receives the results of the
  notify action of the previous transition
L.488-492
  pengine plans the same transition (promote on slesHAtest1)
L.512
  CRMd launch the pre notify promote, then promote pgsqld on slesHAtest1
L.520-522
  The attributes set during the pre-notify action hit attrd AFTER the promote
  action is initiated. Note that PAF is waiting for each attribute to be
  visible from the node itself before going on. 
L.525-526,529-530
  CRMd abort the transition because it receives the results of the
  notify action of the previous transition (two of them!)
L.531-535
  Some more parameter are deleted/set from attrd. At this point, I'm not sure
  what transition was setting this values. They seems to collide...
L.538-542
  CRMd abort the transition because it receives the results of the
  notify action of the previous transition (another two of them!)
L.543-547
  Attrd keeps deleting/setting parameters coming from pre-notify actions
L.551
  LRMd promote local pgsqld on slesHAtest1 (CRMd initiated it on L.512). L.551
  This promotion fails because PAF is not able anymore to detect this was a
  originally master recovery and trigger a promotion where the second node
  wins.
L.566
  Because Pacemaker ignore the notify action return value, PAF rset a
  higher master score on the second node and raise an error to force the new
  promotion.
L.653-658
  pengine plans to recover Master on slesHAtest1, demote it, and promote
  pgsqld on slesHAtest2
L.695-696
  CRMd initiate the first demote, on slesHAtest1, then abort the transition
  because of old events again: 
L767-769
  pengine plans to recover pgsqld on both node and promote it on
  slesHAtest2
L.800-804
  As usual, CRMd initiate the stop actions on both nodes (L.772,775) but
  decide to fails the transition because of old events.
L.863-867
  pengine plans to start pgsqld on both node and promote it on slesHAtest2

This one succeed \o/

Cheers,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pacemaker.late.log
Type: text/x-log
Size: 175738 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180409/a0076982/attachment-0001.bin>