[ClusterLabs] [External] : Re: Fence Agent tests

Robert Hayden robert.h.hayden at oracle.com
Wed Nov 9 08:57:43 EST 2022


> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Andrei
> Borzenkov
> Sent: Wednesday, November 9, 2022 2:59 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> 
> On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden
> <robert.h.hayden at oracle.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Users <users-bounces at clusterlabs.org> On Behalf Of Valentin
> Vidic
> > > via Users
> > > Sent: Sunday, November 6, 2022 5:20 PM
> > > To: users at clusterlabs.org
> > > Cc: Valentin Vidić <vvidic at valentin-vidic.from.hr>
> > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> > >
> > > On Sun, Nov 06, 2022 at 09:08:19PM +0000, Robert Hayden wrote:
> > > > When SBD_PACEMAKER was set to "yes", the lack of network
> connectivity
> > > to the node
> > > > would be seen and acted upon by the remote nodes (evicts and takes
> > > > over ownership of the resources).  But the impacted node would just
> > > > sit logging IO errors.  Pacemaker would keep updating the
> /dev/watchdog
> > > > device so SBD would not self evict.   Once I re-enabled the network,
> then
> > > the
> > >
> > > Interesting, not sure if this is the expected behaviour based on:
> > >
> > >
> https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2
> > > 017-
> > >
> August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA
> > > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia-
> > > GaB38wRJ7Eq4Q3GyT5C3s8y7w$
> > >
> > > Does SBD log "Majority of devices lost - surviving on pacemaker" or
> > > some other messages related to Pacemaker?
> >
> > Yes.
> >
> > >
> > > Also what is the status of Pacemaker when the network is down? Does it
> > > report no quorum or something else?
> > >
> >
> > Pacemaker on the failing node shows quorum even though it has lost
> > communication to the Quorum Device and to the other node in the cluster.
> > The non-failing node of the cluster can see the Quorum Device system and
> > thus correctly determines to fence the failing node and take over its
> > resources.
> >
> > Only after I run firewall-cmd --panic-off, will the failing node start to log
> > messages about loss of TOTEM and getting a new consensus with the
> > now visible members.
> >
> 
> Where exactly do you use firewalld panic mode? You have hosts, you
> have VM, you have qnode ...
> 
> Have you verified that the network is blocked bidirectionally? I had
> rather mixed experience with asymmetrical firewalls which resembles
> your description.

In my testing harness, I will send a script to the remote node which 
contains the firewall-cmd --panic-on, a sleep command, and then 
turn off the panic mode.  That way I can adjust the length of time
network is unavailable on a single node.  I used to log into a network 
switch to turn ports off, but that is not possible in a Cloud environment.
I have also played with manually creating iptables rules, but the panic mode
is simply easier and accomplishes the task.

I have verified that when panic mode is on, no inbound or outbound
network traffic is allowed.   This includes iSCSI packets as well.  You better
have access to the console or the ability to reset the system.


> 
> Also it may depend on the corosync driver in use.
> 
> > I think all of that explains the lack of self-fencing when the sbd setting of
> > SBD_PACEMAKER=yes is used.
> >
> 
> Correct. This means that at least under some conditions
> pacemaker/corosync fail to detect isolation.
> _______________________________________________
> Manage your subscription:
> https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u
> sers__;!!ACWV5N9M2RV99hQ!IMFB2Teli90q80SZ0fS4861iqEF-
> yFGiPUvE81iTEJM4MHWMqoPOAxaJL5Fwmyr8py4S4QRvU4INEiY6YXvIH5c$
> 
> ClusterLabs home:
> https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9
> M2RV99hQ!IMFB2Teli90q80SZ0fS4861iqEF-
> yFGiPUvE81iTEJM4MHWMqoPOAxaJL5Fwmyr8py4S4QRvU4INEiY6sVTZv74$


More information about the Users mailing list