[ClusterLabs] Two node cluster goes into split brain scenario during CPU intensive tasks

Thu Jun 27 06:02:09 EDT 2019

On 25/06/19 12:20 -0500, Ken Gaillot wrote:
> On Tue, 2019-06-25 at 11:06 +0000, Somanath Jeeva wrote:
> Addressing the root cause, I'd first make sure corosync is running at
> real-time priority (I forget the ps option, hopefully someone else can
> chime in).

In a standard Linux environment, I find this ultimately convenient:

  # chrt -p $(pidof corosync)
  pid 6789's current scheduling policy: SCHED_RR
  pid 6789's current scheduling priority: 99

(requires util-linux, procps-ng)

> Another possibility would be to raise the corosync token
> timeout to allow for a greater time before a split is declared.

This is the unavoidable trade-off between limiting false positives
(negligible glitches triggering the riot) vs. timely manner of
detecting the actual node/interconnect failures.  Just meant to
note it's not a one-way street, deliberation given the circumstances
needed.

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190627/07174905/attachment.sig>