[ClusterLabs] [External] : Re: Fence Agent tests

Andrei Borzenkov arvidjaar at gmail.com
Wed Nov 9 03:58:38 EST 2022


On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden <robert.h.hayden at oracle.com> wrote:
>
>
> > -----Original Message-----
> > From: Users <users-bounces at clusterlabs.org> On Behalf Of Valentin Vidic
> > via Users
> > Sent: Sunday, November 6, 2022 5:20 PM
> > To: users at clusterlabs.org
> > Cc: Valentin Vidić <vvidic at valentin-vidic.from.hr>
> > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> >
> > On Sun, Nov 06, 2022 at 09:08:19PM +0000, Robert Hayden wrote:
> > > When SBD_PACEMAKER was set to "yes", the lack of network connectivity
> > to the node
> > > would be seen and acted upon by the remote nodes (evicts and takes
> > > over ownership of the resources).  But the impacted node would just
> > > sit logging IO errors.  Pacemaker would keep updating the /dev/watchdog
> > > device so SBD would not self evict.   Once I re-enabled the network, then
> > the
> >
> > Interesting, not sure if this is the expected behaviour based on:
> >
> > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2
> > 017-
> > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA
> > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia-
> > GaB38wRJ7Eq4Q3GyT5C3s8y7w$
> >
> > Does SBD log "Majority of devices lost - surviving on pacemaker" or
> > some other messages related to Pacemaker?
>
> Yes.
>
> >
> > Also what is the status of Pacemaker when the network is down? Does it
> > report no quorum or something else?
> >
>
> Pacemaker on the failing node shows quorum even though it has lost
> communication to the Quorum Device and to the other node in the cluster.
> The non-failing node of the cluster can see the Quorum Device system and
> thus correctly determines to fence the failing node and take over its
> resources.
>
> Only after I run firewall-cmd --panic-off, will the failing node start to log
> messages about loss of TOTEM and getting a new consensus with the
> now visible members.
>

Where exactly do you use firewalld panic mode? You have hosts, you
have VM, you have qnode ...

Have you verified that the network is blocked bidirectionally? I had
rather mixed experience with asymmetrical firewalls which resembles
your description.

Also it may depend on the corosync driver in use.

> I think all of that explains the lack of self-fencing when the sbd setting of
> SBD_PACEMAKER=yes is used.
>

Correct. This means that at least under some conditions
pacemaker/corosync fail to detect isolation.


More information about the Users mailing list