[ClusterLabs] Antw: Re: Antw: [EXT] Inquiry - remote node fencing issue

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Oct 28 06:07:09 EDT 2021


>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 28.10.2021 um 09:58 in
Nachricht
<CAA91j0Wptn=2v_vNN84CyiLaM9BeB4Yc3UQFcuy4TttuhwK6TQ at mail.gmail.com>:
> On Thu, Oct 28, 2021 at 10:30 AM Ulrich Windl
> <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>
>> Fencing _is_ a part of failover!
>>
> 
> As any blanket answer this is mostly incorrect in this context.

If I read the logs correctly, a monitoring operation timed out, and as a
consequence the corresponding node would be fenced.
So the resource would fail over to another node.

> 
> There are two separate objects here - remote host itself and pacemaker
> resource used to connect to and monitor state of remote host.
> 
> Remote host itself does not failover. Resources on this host do, but
> OP does not ask about it.

Then I missed that detail.

> 
> Pacemaker resource used to monitor remote host may failover as any
> other cluster resource. This failover does not require any fencing *of
> remote host itself*, and in this particular case connection between
> two cluster nodes was present all the time (at least, as long as we
> can believe logs) so there was no reason for fencing as well. Whether
> pacemaker should attempt to failover this resource to another node if
> connection to remote host fails, is subject to discussion.
> 
> So fencing of the remote host itself is most certainly *not* part of
> the failover of the resource that monitors this remote host.

I just treated the resources as a black box, not looking what they do.

Regards,
Ulrich


> 
>> >>> "Janghyuk Boo" <Janghyuk.Boo at ibm.com> schrieb am 26.10.2021 um 22:09
in
>> Nachricht
>> <OF6751AF09.DD2C657C-ON0025877A.006EA8CB-0025877A.006EB632 at ibm.com>:
>> Dear Community ,
>> Thank you Ken for your reply last time.
>> I attached the log messages as requested from the last thread.
>> I have a Pacemaker cluster with two cluster nodes with two network 
> interfaces
>> each, and two remote nodes and a prototyped fencing agent(GPFS-Fence) to
cut 
> a
>> hosts access from the clustered filesystem.
>> I noticed that remote node gets fenced when the quorum node its connected
to
>> gets fenced or experiences network failure.
>> For example, when I disconnected srv-2 from the rest of the cluster by
using
>> iptables on srv-2
>> iptables -A INPUT -s [IP of srv-1] -j DROP ; iptables -A OUTPUT -s [IP of
>> srv-1] -j DROP
>> iptables -A INPUT -s [IP of srv-3] -j DROP ; iptables -A OUTPUT -s [IP of
>> srv-3] -j DROP
>> iptables -A INPUT -s [IP of srv-4] -j DROP ; iptables -A OUTPUT -s [IP of
>> srv-4] -j DROP
>> I expected that remote node jangcluster-srv-4 would failover to srv-1 given

> my
>> location constraints,
>> but remote node’s monitor ‘jangcluster-srv-4_monitor’ failed and srv-4 was
>> getting fenced before attempting to failover.
>> What would be the proper way to simulate the network failover?
>> How can I configure the cluster so that remote node srv-4 fails over
instead
>> of getting fenced?
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list