[ClusterLabs] Master/slave failover does not work as expected

Jan Pokorný jpokorny at redhat.com
Tue Aug 13 16:00:52 EDT 2019


On 13/08/19 09:44 +0200, Ulrich Windl wrote:
>>>> Harvey Shepherd <Harvey.Shepherd at Aviatnet.com> schrieb am 12.08.2019 um 23:38
> in Nachricht <ec767e3d-0cde-42c2-a8de-72ffce859e2f at email.android.com>:
>> I've been experiencing exactly the same issue. Pacemaker prioritises 
>> restarting the failed resource over maintaining a master instance. In my case 
>> I used crm_simulate to analyse the actions planned and taken by pacemaker 
>> during resource recovery. It showed that the system did plan to failover the 
>> master instance, but it was near the bottom of the action list. Higher 
>> priority was given to restarting the failed instance, consequently when that 
>> had occurred, it was easier just to promote the same instance rather than 
>> failing over.
> 
> That's interesting: Maybe usually it's actually faster to restart a
> failed (master) process rather than promoting a slave to master,
> possibly demoting the old master to slave, etc.
> 
> But most obviously while there is a (possible) resource utilization
> for resources, there is none for operations (AFAIK): If one could
> configure "operation costs" (maybe as rules), the cluster could
> prefer the transition with least costs. Unfortunately it will make
> things more complicated.
> 
> I could even imagine if you set the cost for "stop" to infinity, the
> cluster will not even try to stop the resource, but will fence the
> node instead...

Very courageous and highly nontrivial if you think about the
scalability impact (when at it, not that these wouldn't be mitigable
to some extent, switching single brain/DC into segmented multi-leader
approach met with hierarchical scheduling -- there are usually some
clusters [pun intended] of resources rather than each one coinciding
with all the others when the total count goes up).

Anyway, thanks for sharing the ideas, Ulrich, not just now :-)

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190813/8f3d6001/attachment.sig>


More information about the Users mailing list