[ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

Mon Feb 25 23:03:19 EST 2019

25.02.2019 23:13, Ken Gaillot пишет:
> On Mon, 2019-02-25 at 14:20 +0530, Samarth Jain wrote:
>> Hi,
>>
>>
>> We have a bunch of resources running in master slave configuration
>> with one master and one slave instance running at any given time.
>>
>> What we observe is, that for any two given resources at a time, if
>> say resource Stateful_Test_1 is in middle of doing a promote and it
>> takes significant amount of time (close to 150 seconds in our
>> scenario) for it to complete promote (like starting a web server)
>> and, during this time, say resource Stateful_Test_2's master instance
>> fails, then the failure of Stateful_Test_2 master is never honored by
>> pengine and the monitor being reoccurring keeps on failing without
>> any action being taken by the DC.
>>
>> We see below logs for the failure of Stateful_Test_2 in the DC which
>> was VM-3 at that time:
>>
>> Feb 25 11:28:13 [6013] VM-3       crmd:   notice:
>> abort_transition_graph:      Transition aborted by operation
>> Stateful_Test_2_monitor_17000 'create' on VM-1: Old event |
>> magic=0:9;329:8:8:4a2b407e-ad15-43d0-8248-e70f9f22436b cib=0.191.5
>> source=process_graph_event:498 complete=false
>>
>> As per our current testing, the Stateful_Test_2 resource has failed
>> 590 times and it still continues to fail!! without the failure being
>> processed by pacemaker. We have to manually intervene to recover it
>> by doing a resource restart.
>>
>> Could you please help me understand:
>> 1. Why doesn't pacemaker process the failure of Stateful_Test_2
>> resource immediately after first failure?
> 
> All actions that have already been initiated must complete before the
> cluster can react to new conditions. The outcome of those actions can
> (and likely will) affect what needs to be done, so the cluster has to
> wait for them. The action timeouts are the only way to really affect
> this.
> 

Well, promote action sets master score and this aborts and re-evaluates
current transition. So it's not that this rule is set in stone, there
are obviously situations when pacemaker does not wait for operation to
complete before starting next transition.

> We've discussed the theoretical possibility of figuring out what would
> have to be done regardless of the outcome of the in-flight actions, but
> that might be computationally impractical.
> 

I'm not sure why we need "what if" guessing. If new transition evaluates
to the same resource state, pacemaker knows that operation is already in
flight and does not need to do anything. If new resource states is
different, cannot pacemaker simply cancel current operation and initiate
different one?

I understand that operations *on the same resource* need serialization,
but between completely independent resources?