[ClusterLabs] Prevent Corosync Qdevice Failback in split brain scenario.

Thu Jan 2 08:30:15 EST 2020

Somanath,

> Hi ,
> 
> I am planning to use Corosync Qdevice version 3.0.0 with corosync version 2.4.4 and pacemaker 1.1.16 in a two node cluster.
> 
> I want to know if failback can be avoided in the below situation.
> 
> 
>    1.  The pcs cluster is in split brain scenario after a network break between two nodes. But both nodes are visible and reachable from qdevice node.
>    2.  The qdevice with ffsplit algorithm selects node with id 1 (lowest node id) and node 1 becomes quorate.
>    3.  Now if node 1 goes down/is not reachable from qdevice node , the node 2 becomes quorate.
> 
> But when node 1 becomes again reachable from qdevice , it becomes quorate and node 2 again goes down. i.e The resources failback to node 1.
> 
> Is there any way to prevent this failback.

No on the qdevice/qnetd level (or at least not generic solution - it is 
possible to set tie_breaker to node 2, so resources wouldn't be then 
shifted for this specific example, but it is probably not what was 
question about).

I can imagine to have an algorithm option so qnetd would try to keep 
active partition active as long as other requirements are fulfilled - so 
basically add one more test just before calling tie-breaker.

Such option shouldn't be hard to implement and right now I'm not aware 
about any inconsistencies which may it bring.

Regards,
   Honza

> 
> 
> With Regards
> Somanath Thilak J
> 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
>