[ClusterLabs] inquiry - remote node fails over
Ken Gaillot
kgaillot at redhat.com
Wed Oct 13 18:21:30 EDT 2021
On Wed, 2021-10-13 at 21:13 +0000, Janghyuk Boo wrote:
> Dear Community,
>
> I have a Pacemaker cluster with two cluster nodes with two network
> interfaces each, and two remote nodes and one fabric fencing agent
> for the whole cluster.
> nodelist {
> node {
> ring0_addr: xxx
> ring1_addr: xxx
> name: jangcluster-srv-1
> nodeid: 1
> }
> node {
> ring0_addr: xxx
> ring1_addr: xxx
> name: jangcluster-srv-2
> nodeid: 2
> }
> }
>
> Node List:
> * Online: [ jangcluster-srv-1 jangcluster-srv-2 ]
> * RemoteOnline: [ jangcluster-srv-3 jangcluster-srv-4 ]
>
> Full List of Resources:
> * GPFS-Fence (stonith:fence_gpfs): Started jangcluster-srv-1
> * jangcluster-srv-3 (ocf::pacemaker:remote): Started
> jangcluster-srv-1
> * jangcluster-srv-4 (ocf::pacemaker:remote): Started
> jangcluster-srv-2
>
> node 1: jangcluster-srv-1 \
> attributes ethmonitor-eth1=1
> node 2: jangcluster-srv-2 \
> attributes ethmonitor-eth1=1
> node jangcluster-srv-3:remote \
> attributes ethmonitor-eth1=1
> node jangcluster-srv-4:remote \
> attributes ethmonitor-eth1=1
> primitive GPFS-Fence stonith:fence_gpfs \
> params ipaddr=jangcluster-srv-1 pcmk_host_list=" jangcluster-
> srv-1 jangcluster-srv-2 jangcluster-srv-3 jangcluster-srv-4"
> secure=true \
> op monitor interval=10s timeout=500s \
> op off interval=0 \
> meta is-managed=true
> primitive NIC_eth1 ethmonitor \
> params interface=eth1 repeat_count=4 repeat_interval=4
> link_status_only=true \
> op monitor timeout=30s interval=4 \
> op start timeout=60s interval=0s \
> op stop interval=0s timeout=20s
> location prefer-node-jangcluster-srv-3 jangcluster-srv-3 100:
> jangcluster-srv-1
> location prefer-node-jangcluster-srv-4 jangcluster-srv-4 100:
> jangcluster-srv-2
> location prefer-node-jangcluster-srv-3-2 jangcluster-srv-3 50:
> jangcluster-srv-2
> location prefer-node-jangcluster-srv-4-2 jangcluster-srv-4 50:
> jangcluster-srv-1
>
>
> I noticed that remote node gets fenced when the quorum node its
> connected to gets fenced or experiences network failure.
> For example, when I disconnected srv-2 from the rest of the
> cluster,
> I expected that remote node jangcluster-srv-4 would failover to srv-
> 1 given my location constraints,
> but srv-4 was getting fenced along with srv-2 instead of failing
> over.
> How can I configure the cluster so that remote node srv-4 fails over
> instead of getting fenced?
>
>
> Thank you
>
> Janghyuk Boo.
Hi,
That is how it works whenever possible. If it fences the remote, it is
because it was not recoverable. Logs from srv-1, srv-2, and srv-4
around that time would be helpful to give more detail.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list