[ClusterLabs] Antw: Re: Two node cluster goes into split brain scenario during CPU intensive tasks

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Jul 1 07:26:48 EDT 2019


>>> Jan Pokorný <jpokorny at redhat.com> schrieb am 27.06.2019 um 12:02 in
Nachricht
<20190627100209.GF31192 at redhat.com>:
> On 25/06/19 12:20 ‑0500, Ken Gaillot wrote:
>> On Tue, 2019‑06‑25 at 11:06 +0000, Somanath Jeeva wrote:
>> Addressing the root cause, I'd first make sure corosync is running at
>> real‑time priority (I forget the ps option, hopefully someone else can
>> chime in).
> 
> In a standard Linux environment, I find this ultimately convenient:
> 
>   # chrt ‑p $(pidof corosync)
>   pid 6789's current scheduling policy: SCHED_RR
>   pid 6789's current scheduling priority: 99

To me this is like pushing a car that already has a running engine! If
corosync does crazy things, this will make things worse (i.e. enhance
crazyness).

> 
> (requires util‑linux, procps‑ng)
> 
>> Another possibility would be to raise the corosync token
>> timeout to allow for a greater time before a split is declared.
> 
> This is the unavoidable trade‑off between limiting false positives
> (negligible glitches triggering the riot) vs. timely manner of
> detecting the actual node/interconnect failures.  Just meant to
> note it's not a one‑way street, deliberation given the circumstances
> needed.
> 
> ‑‑ 
> Jan (Poki)





More information about the Users mailing list