[ClusterLabs] Antw: [EXT] Recoveing from node failure

Sat Dec 12 12:30:06 EST 2020

Thanks, I will experiment this.

Now, I have a last issue about stonith.
I tried to reproduce a stonith situation, by disabling the network interface used for HA on node 1.
Stonith is configured with ipmi poweroff.
What happens, is that once the interface is down, both nodes tries to stonith the other node, causing both to poweroff...
I would like the node running all resources (zpool and nfs ip) to be the first trying to stonith the other node.
Or is there anything else better?

Here is the current crm config show:

node 1: xstha1 \        attributes standby=off maintenance=offnode 2: xstha2 \        attributes standby=off maintenance=offprimitive xstha1-stonith stonith:external/ipmi \        params hostname=xstha1 ipaddr=192.168.221.18 userid=ADMIN passwd="******" interface=lanplus \        op monitor interval=25 timeout=25 start-delay=25 \        meta target-role=Startedprimitive xstha1_san0_IP IPaddr \        params ip=10.10.10.1 cidr_netmask=255.255.255.0 nic=san0primitive xstha2-stonith stonith:external/ipmi \        params hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN passwd="******" interface=lanplus \        op monitor interval=25 timeout=25 start-delay=25 \        meta target-role=Startedprimitive xstha2_san0_IP IPaddr \        params ip=10.10.10.2 cidr_netmask=255.255.255.0 nic=san0primitive zpool_data ZFS \        params pool=test \        op start timeout=90 interval=0 \        op stop timeout=90 interval=0 \        meta target-role=Startedlocation xstha1-stonith-pref xstha1-stonith -inf: xstha1location xstha1_san0_IP_pref xstha1_san0_IP 100: xstha1location xstha2-stonith-pref xstha2-stonith -inf: xstha2location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2order zpool_data_order inf: zpool_data ( xstha1_san0_IP )location zpool_data_pref zpool_data 100: xstha1colocation zpool_data_with_IPs inf: zpool_data xstha1_san0_IPproperty cib-bootstrap-options: \        have-watchdog=false \        dc-version=1.1.15-e174ec8 \        cluster-infrastructure=corosync \        stonith-action=poweroff \        no-quorum-policy=stop

Thanks!
Gabriele

Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets

----------------------------------------------------------------------------------

Da: Andrei Borzenkov <arvidjaar at gmail.com>
A: users at clusterlabs.org 
Data: 11 dicembre 2020 18.30.29 CET
Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

11.12.2020 18:37, Gabriele Bulfon пишет:
> I found I can do this temporarily:
>  
> crm config property cib-bootstrap-options: no-quorum-policy=ignore
>  

All two node clusters I remember run with setting forever :)

> then once node 2 is up again:
>  
> crm config property cib-bootstrap-options: no-quorum-policy=stop
>  
> so that I make sure nodes will not mount in another strange situation.
>  
> Is there any better way? 

"better" us subjective, but ...

> (such as ignore until everything is back to normal then conisder top again)
>  

That is what stonith does. Because quorum is pretty much useless in two
node cluster, as I already said all clusters I have seem used
no-quorum-policy=ignore and stonith-enabled=true. It means when node
boots and other node is not available stonith is attempted; if stonith
succeeds pacemaker continues with starting resources; if stonith fails,
node is stuck.

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201212/507ab3a3/attachment.htm>