[ClusterLabs] Prevent Corosync Qdevice Failback in split brain scenario.

Tue Jan 7 13:22:49 EST 2020

On 02/01/20 14:30 +0100, Jan Friesse wrote:
>> I am planning to use Corosync Qdevice version 3.0.0 with corosync
>> version 2.4.4 and pacemaker 1.1.16 in a two node cluster.
>> 
>> I want to know if failback can be avoided in the below situation.
>> 
>> 
>>   1.  The pcs cluster is in split brain scenario after a network
>>   break between two nodes. But both nodes are visible and reachable
>>   from qdevice node.
>>   2.  The qdevice with ffsplit algorithm selects node with id 1
>>   (lowest node id) and node 1 becomes quorate.
>>   3.  Now if node 1 goes down/is not reachable from qdevice node,
>>   the node 2 becomes quorate.
>> 
>> But when node 1 becomes again reachable from qdevice , it becomes
>> quorate and node 2 again goes down. i.e The resources failback to
>> node 1.
>> 
>> Is there any way to prevent this failback.
> 
> No on the qdevice/qnetd level (or at least not generic solution - it
> is possible to set tie_breaker to node 2, so resources wouldn't be
> then shifted for this specific example, but it is probably not what
> was question about).
> 
> I can imagine to have an algorithm option so qnetd would try to keep active
> partition active as long as other requirements are fulfilled - so basically
> add one more test just before calling tie-breaker.
> 
> Such option shouldn't be hard to implement and right now I'm not
> aware about any inconsistencies which may it bring.

While this is not to bring anything new, it may be interesting to
point out these similarities of the more generic:

  "you gotta keep in about the best shape possible"

  vs.

  "oh no, you are taking it rather too literally, man,
  calm down, each transition is costly so you must
  use your head, capito?!"

dilemma.

We observe the same in the field of resource allocation across the
cluster: a resource is preferred to run on particular node, but when
circumstances prevent that while allowing for alternative nodes,
the respective migration happens, but when the original node is back
later on, it may be very undesired to redraw the map once again,
i.e. to tolerate suboptimality wrt. the original predestination.
In pacemaker, this suboptimality tolerance (or original predestination
neutralizer) is coined as a "stickiness" property, and it depends
whether this tolerance wins over the "strength" (score) of the
resource-to-node intention.

My guess is corosync, free of such fuzzy scales of preference (when
there are any competing partitions still, all other break-the-symmetry
options were exhausted already) strengths would just want a binary
toggle "keep quorum per partition decision sticky if there's no other
winner in the standard-except-tie_breaker competition".

Also, any attempt to go further and melt strict rules into relativizing
score system akin to pacemaker (e.g. do not even change "quorate
partition" decision when the winner partition has a summary score
just up to X points more) would, at the quorum level, effectively
subvert the goals of perpetually sustainable HA, which cannot be told
about said binary toggle, I think, that'd be merely a non-conflicting
(as also Honza evaluated) balance tweak in particular corner cases.

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20200107/a6ec4946/attachment.sig>