[ClusterLabs] Antw: [EXT] Re: Two node cluster without fencing and no split brain?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Jul 22 09:36:03 EDT 2021
>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 22.07.2021 um
12:05 in
Nachricht <20210722120537.0d65c2a1 at firost>:
> On Wed, 21 Jul 2021 22:02:21 -0400
> "Frank D. Engel, Jr." <fde101 at fjrhome.net> wrote:
>
>> In OpenVMS, the kernel is aware of the cluster. As is mentioned in that
>> presentation, it actually stops processes from running and blocks access
>> to clustered storage when quorum is lost, and resumes them appropriately
>> once it is re-established.
>>
>> In other words... no reboot, no "death" of the cluster node or special
>> arrangements with storage hardware... If connectivity is restored, the
>> services are simply resumed.
>
> Well, when losing the quorum, by default Pacemaker stop its local
resources.
But when a node without quorum performs any actions it may corrupt data (e.g.
writing to a non-shared filesystem like ext3 on a shared medium like iSCSI or
FC_SAN).
IMHO the only safe action when loosing quorum is to stop any action
immediately. That does NOT mean to STOP resources; instead it means "immediate
deatch", probably even without syncing disks.
> Considering a clustered storage, the resources are the lock manager, iscsi
> or
> some other mean, FS etc.
>
> However, if the resources stop actions doesn't succeed, THEN the node reset
> itself. Should your cluster have active fencing, the node might be reset by
> some
> external mean.
>
> As Digimer wrote, «Quorum is a tool for when things are working
> predictably».
> To do some rewording in regard with the current topic: if Pacemaker is able
> to
> stop its resources after a quorum lost, it will not reboot, no "death"
> either.
>
>> I had a 3-node OpenVMS cluster running virtualized at one point on the
>> hobbyist license and my cluster storage for that setup was simply to
>> mirror the disks across the three nodes (via software which is
>> integrated into OpenVMS); almost like RAID 1 across the network. If I
>> "broke" the cluster and one of the servers lost quorum (due to
>> connectivity) it would just sit and wait for the connectivity to be
>> restored, then resync the storage and pick up essentially where it left
off.
>
> I believe this might be possible using a Pacemaker stack. However, I never
> built such a cluster. So hopefully some other people around there with more
> experience on clustered FS will infirm or confirm with some more details.
I think pacemaker would need kernel support for that (cease all disk
operations then invalidate all disk buffers and re-read them).
Regards,
Ulrich
>
> Regards,
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list