[ClusterLabs] Weird Fencing Behavior

Tue Jul 17 14:51:50 UTC 2018

On Tue, 2018-07-17 at 21:29 +0800, Confidential Company wrote:
> 
> > Hi,
> >
> > On my two-node active/passive setup, I configured fencing via
> > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I
> expected
> > that both nodes will be stonithed simultaenously.
> >
> > On my test scenario, Node1 has ClusterIP resource. When I
> disconnect
> > service/corosync link physically, Node1 was fenced and Node2 keeps
> alive
> > given pcmk_delay=0 on both nodes.
> >
> > Can you explain the behavior above?
> >
> 
> #node1 could not connect to ESX because links were disconnected. As
> the
> #most obvious explanation.
> 
> #You have logs, you are the only one who can answer this question
> with
> #some certainty. Others can only guess.
> 
> 
> Oops, my bad. I forgot to tell. I have two interfaces on each virtual
> machine (nodes). second interface was used for ESX links, so fence
> can be executed even though corosync links were disconnected. Looking
> forward to your response. Thanks

Having no fence delay means a death match (each node killing the other)
is possible, but it doesn't guarantee that it will happen. Some of the
time, one node will detect the outage and fence the other one before
the other one can react.

It's basically an Old West shoot-out -- they may reach for their guns
at the same time, but one may be quicker.

As Andrei suggested, the logs from both nodes could give you a timeline
of what happened when.

> > See my config below:
> >
> > [root at ArcosRhel2 cluster]# pcs config
> > Cluster Name: ARCOSCLUSTER
> > Corosync Nodes:
> >  ArcosRhel1 ArcosRhel2
> > Pacemaker Nodes:
> >  ArcosRhel1 ArcosRhel2
> >
> > Resources:
> >  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
> >   Attributes: cidr_netmask=32 ip=172.16.10.243
> >   Operations: monitor interval=30s (ClusterIP-monitor-interval-30s)
> >               start interval=0s timeout=20s (ClusterIP-start-
> interval-0s)
> >               stop interval=0s timeout=20s (ClusterIP-stop-
> interval-0s)
> >
> > Stonith Devices:
> >  Resource: Fence1 (class=stonith type=fence_vmware_soap)
> >   Attributes: action=off ipaddr=172.16.10.151 login=admin
> passwd=123pass
> > pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s
> port=ArcosRhel1(Joniel)
> > ssl_insecure=1 pcmk_delay_max=0s
> >   Operations: monitor interval=60s (Fence1-monitor-interval-60s)
> >  Resource: fence2 (class=stonith type=fence_vmware_soap)
> >   Attributes: action=off ipaddr=172.16.10.152 login=admin
> passwd=123pass
> > pcmk_delay_max=0s pcmk_host_list=ArcosRhel2
> pcmk_monitor_timeout=60s
> > port=ArcosRhel2(Ben) ssl_insecure=1
> >   Operations: monitor interval=60s (fence2-monitor-interval-60s)
> > Fencing Levels:
> >
> > Location Constraints:
> >   Resource: Fence1
> >     Enabled on: ArcosRhel2 (score:INFINITY)
> > (id:location-Fence1-ArcosRhel2-INFINITY)
> >   Resource: fence2
> >     Enabled on: ArcosRhel1 (score:INFINITY)
> > (id:location-fence2-ArcosRhel1-INFINITY)
> > Ordering Constraints:
> > Colocation Constraints:
> > Ticket Constraints:
> >
> > Alerts:
> >  No alerts defined
> >
> > Resources Defaults:
> >  No defaults set
> > Operations Defaults:
> >  No defaults set
> >
> > Cluster Properties:
> >  cluster-infrastructure: corosync
> >  cluster-name: ARCOSCLUSTER
> >  dc-version: 1.1.16-12.el7-94ff4df
> >  have-watchdog: false
> >  last-lrm-refresh: 1531810841
> >  stonith-enabled: true
> >
> > Quorum:
> >   Options:
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
> h.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot <kgaillot at redhat.com>