[ClusterLabs] Master/slave failover does not work as expected
jpokorny at redhat.com
Tue Aug 13 16:00:52 EDT 2019
On 13/08/19 09:44 +0200, Ulrich Windl wrote:
>>>> Harvey Shepherd <Harvey.Shepherd at Aviatnet.com> schrieb am 12.08.2019 um 23:38
> in Nachricht <ec767e3d-0cde-42c2-a8de-72ffce859e2f at email.android.com>:
>> I've been experiencing exactly the same issue. Pacemaker prioritises
>> restarting the failed resource over maintaining a master instance. In my case
>> I used crm_simulate to analyse the actions planned and taken by pacemaker
>> during resource recovery. It showed that the system did plan to failover the
>> master instance, but it was near the bottom of the action list. Higher
>> priority was given to restarting the failed instance, consequently when that
>> had occurred, it was easier just to promote the same instance rather than
>> failing over.
> That's interesting: Maybe usually it's actually faster to restart a
> failed (master) process rather than promoting a slave to master,
> possibly demoting the old master to slave, etc.
> But most obviously while there is a (possible) resource utilization
> for resources, there is none for operations (AFAIK): If one could
> configure "operation costs" (maybe as rules), the cluster could
> prefer the transition with least costs. Unfortunately it will make
> things more complicated.
> I could even imagine if you set the cost for "stop" to infinity, the
> cluster will not even try to stop the resource, but will fence the
> node instead...
Very courageous and highly nontrivial if you think about the
scalability impact (when at it, not that these wouldn't be mitigable
to some extent, switching single brain/DC into segmented multi-leader
approach met with hierarchical scheduling -- there are usually some
clusters [pun intended] of resources rather than each one coinciding
with all the others when the total count goes up).
Anyway, thanks for sharing the ideas, Ulrich, not just now :-)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 819 bytes
Desc: not available
More information about the Users