[ClusterLabs] Antw: [EXT] Re: Two node cluster without fencing and no split brain?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Jul 22 09:36:03 EDT 2021


>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 22.07.2021 um
12:05 in
Nachricht <20210722120537.0d65c2a1 at firost>:
> On Wed, 21 Jul 2021 22:02:21 -0400
> "Frank D. Engel, Jr." <fde101 at fjrhome.net> wrote:
> 
>> In OpenVMS, the kernel is aware of the cluster.  As is mentioned in that 
>> presentation, it actually stops processes from running and blocks access 
>> to clustered storage when quorum is lost, and resumes them appropriately 
>> once it is re-established.
>> 
>> In other words... no reboot, no "death" of the cluster node or special 
>> arrangements with storage hardware...  If connectivity is restored, the 
>> services are simply resumed.
> 
> Well, when losing the quorum, by default Pacemaker stop its local
resources.

But when a node without quorum performs any actions it may corrupt data (e.g.
writing to a non-shared filesystem like ext3 on a shared medium like iSCSI or
FC_SAN).
IMHO the only safe action when loosing quorum is to stop any action
immediately. That does NOT mean to STOP resources; instead it means "immediate
deatch", probably even without syncing disks.

> Considering a clustered storage, the resources are the lock manager, iscsi 
> or
> some other mean, FS etc.
> 
> However, if the resources stop actions doesn't succeed, THEN the node reset
> itself. Should your cluster have active fencing, the node might be reset by

> some
> external mean.
> 
> As Digimer wrote, «Quorum is a tool for when things are working 
> predictably».
> To do some rewording in regard with the current topic: if Pacemaker is able

> to
> stop its resources after a quorum lost, it will not reboot, no "death" 
> either.
> 
>> I had a 3-node OpenVMS cluster running virtualized at one point on the 
>> hobbyist license and my cluster storage for that setup was simply to 
>> mirror the disks across the three nodes (via software which is 
>> integrated into OpenVMS); almost like RAID 1 across the network.  If I 
>> "broke" the cluster and one of the servers lost quorum (due to 
>> connectivity) it would just sit and wait for the connectivity to be 
>> restored, then resync the storage and pick up essentially where it left
off.
> 
> I believe this might be possible using a Pacemaker stack. However, I never
> built such a cluster. So hopefully some other people around there with more
> experience on clustered FS will infirm or confirm with some more details.

I think pacemaker would need kernel support for that (cease all disk
operations then invalidate all disk buffers and re-read them).

Regards,
Ulrich

> 
> Regards,
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list