[ClusterLabs] Antw: [EXT] Recoveing from node failure

Mon Dec 14 07:34:34 EST 2020

On 12/14/20 11:48 AM, Gabriele Bulfon wrote:
>
> Thanks!
>
> I tried first option, by adding pcmk_delay_base to the two stonith
> primitives.
> First has 1 second, second has 5 seconds.
> It didn't work :( they still killed each other :(
> Anything wrong with the way I did it?
Maybe 4s difference is a bit close.
Did you try anything significantly larger?

Klaus
>  
> Here's the config:
>  
> node 1: xstha1 \
>         attributes standby=off maintenance=off
> node 2: xstha2 \
>         attributes standby=off maintenance=off
> primitive xstha1-stonith stonith:external/ipmi \
>         params hostname=xstha1 ipaddr=192.168.221.18 userid=ADMIN
> passwd="***" interface=lanplus pcmk_delay_base=1 \
>         op monitor interval=25 timeout=25 start-delay=25 \
>         meta target-role=Started
> primitive xstha1_san0_IP IPaddr \
>         params ip=10.10.10.1 cidr_netmask=255.255.255.0 nic=san0
> primitive xstha2-stonith stonith:external/ipmi \
>         params hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN
> passwd="***" interface=lanplus pcmk_delay_base=5 \
>         op monitor interval=25 timeout=25 start-delay=25 \
>         meta target-role=Started
> primitive xstha2_san0_IP IPaddr \
>         params ip=10.10.10.2 cidr_netmask=255.255.255.0 nic=san0
> primitive zpool_data ZFS \
>         params pool=test \
>         op start timeout=90 interval=0 \
>         op stop timeout=90 interval=0 \
>         meta target-role=Started
> location xstha1-stonith-pref xstha1-stonith -inf: xstha1
> location xstha1_san0_IP_pref xstha1_san0_IP 100: xstha1
> location xstha2-stonith-pref xstha2-stonith -inf: xstha2
> location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2
> order zpool_data_order inf: zpool_data ( xstha1_san0_IP )
> location zpool_data_pref zpool_data 100: xstha1
> colocation zpool_data_with_IPs inf: zpool_data xstha1_san0_IP
> property cib-bootstrap-options: \
>         have-watchdog=false \
>         dc-version=1.1.15-e174ec8 \
>         cluster-infrastructure=corosync \
>         stonith-action=poweroff \
>         no-quorum-policy=stop
>  
>  
> *Sonicle S.r.l. *: http://www.sonicle.com <http://www.sonicle.com/>
> *Music: *http://www.gabrielebulfon.com <http://www.gabrielebulfon.com/>
> *eXoplanets : *https://gabrielebulfon.bandcamp.com/album/exoplanets
>  
>
>
>
> ----------------------------------------------------------------------------------
>
> Da: Andrei Borzenkov <arvidjaar at gmail.com>
> A: users at clusterlabs.org
> Data: 13 dicembre 2020 7.50.57 CET
> Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure
>
>     12.12.2020 20:30, Gabriele Bulfon пишет:
>     > Thanks, I will experiment this.
>     >  
>     > Now, I have a last issue about stonith.
>     > I tried to reproduce a stonith situation, by disabling the
>     network interface used for HA on node 1.
>     > Stonith is configured with ipmi poweroff.
>     > What happens, is that once the interface is down, both nodes
>     tries to stonith the other node, causing both to poweroff...
>
>     Yes, this is expected. The options are basically
>
>     1. Have separate stonith resource for each node and configure static
>     (pcmk_delay_base) or random dynamic (pcmk_delay_max) delays to avoid
>     both nodes starting stonith at the same time. This does not take
>     resources in account.
>
>     2. Use fencing topology and create pseudo-stonith agent that does not
>     attempt to do anything but just delays for some time before continuing
>     with actual fencing agent. Delay can be based on anything including
>     resources running on node.
>
>     3. If you are using pacemaker 2.0.3+, you could use new
>     priority-fencing-delay feature that implements resource-based priority
>     fencing:
>
>     + controller/fencing/scheduler: add new feature
>     'priority-fencing-delay'
>     Optionally derive the priority of a node from the
>     resource-priorities
>     of the resources it is running.
>     In a fencing-race the node with the highest priority has a certain
>     advantage over the others as fencing requests for that node are
>     executed with an additional delay.
>     controlled via cluster option priority-fencing-delay (default = 0)
>
>
>     See also
>     https://www.mail-archive.com/users@clusterlabs.org/msg10328.html
>
>     > I would like the node running all resources (zpool and nfs ip)
>     to be the first trying to stonith the other node.
>     > Or is there anything else better?
>     >  
>     > Here is the current crm config show:
>     >  
>
>     It is unreadable
>
>     > node 1: xstha1 \ attributes standby=off maintenance=offnode 2:
>     xstha2 \ attributes standby=off maintenance=offprimitive
>     xstha1-stonith stonith:external/ipmi \ params hostname=xstha1
>     ipaddr=192.168.221.18 userid=ADMIN passwd="******"
>     interface=lanplus \ op monitor interval=25 timeout=25
>     start-delay=25 \ meta target-role=Startedprimitive xstha1_san0_IP
>     IPaddr \ params ip=10.10.10.1 cidr_netmask=255.255.255.0
>     nic=san0primitive xstha2-stonith stonith:external/ipmi \ params
>     hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN passwd="******"
>     interface=lanplus \ op monitor interval=25 timeout=25
>     start-delay=25 \ meta target-role=Startedprimitive xstha2_san0_IP
>     IPaddr \ params ip=10.10.10.2 cidr_netmask=255.255.255.0
>     nic=san0primitive zpool_data ZFS \ params pool=test \ op start
>     timeout=90 interval=0 \ op stop timeout=90 interval=0 \ meta
>     target-role=Startedlocation xstha1-stonith-pref xstha1-stonith
>     -inf: xstha1location xstha1_san0_IP_pref xstha1_san0_IP 100:
>     xstha1location xstha2-stonith-pref xstha2-stonith -inf:
>     xstha2location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2order
>     zpool_data_order inf: zpool_data ( xstha1_san0_IP )location
>     zpool_data_pref zpool_data 100: xstha1colocation
>     zpool_data_with_IPs inf: zpool_data xstha1_san0_IPproperty
>     cib-bootstrap-options: \ have-watchdog=false \
>     dc-version=1.1.15-e174ec8 \ cluster-infrastructure=corosync \
>     stonith-action=poweroff \ no-quorum-policy=stop
>     >  
>     > Thanks!
>     > Gabriele
>     >  
>     >  
>     > Sonicle S.r.l. : http://www.sonicle.com
>     > Music: http://www.gabrielebulfon.com
>     > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>     >  
>     >
>     >
>     >
>     >
>     >
>     ----------------------------------------------------------------------------------
>     >
>     > Da: Andrei Borzenkov <arvidjaar at gmail.com>
>     > A: users at clusterlabs.org
>     > Data: 11 dicembre 2020 18.30.29 CET
>     > Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure
>     >
>     >
>     > 11.12.2020 18:37, Gabriele Bulfon пишет:
>     >> I found I can do this temporarily:
>     >>  
>     >> crm config property cib-bootstrap-options: no-quorum-policy=ignore
>     >>  
>     >
>     > All two node clusters I remember run with setting forever :)
>     >
>     >> then once node 2 is up again:
>     >>  
>     >> crm config property cib-bootstrap-options: no-quorum-policy=stop
>     >>  
>     >> so that I make sure nodes will not mount in another strange
>     situation.
>     >>  
>     >> Is there any better way?
>     >
>     > "better" us subjective, but ...
>     >
>     >> (such as ignore until everything is back to normal then
>     conisder top again)
>     >>  
>     >
>     > That is what stonith does. Because quorum is pretty much useless
>     in two
>     > node cluster, as I already said all clusters I have seem used
>     > no-quorum-policy=ignore and stonith-enabled=true. It means when node
>     > boots and other node is not available stonith is attempted; if
>     stonith
>     > succeeds pacemaker continues with starting resources; if stonith
>     fails,
>     > node is stuck.
>     >
>     > _______________________________________________
>     > Manage your subscription:
>     > https://lists.clusterlabs.org/mailman/listinfo/users
>     >
>     > ClusterLabs home: https://www.clusterlabs.org/
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > Manage your subscription:
>     > https://lists.clusterlabs.org/mailman/listinfo/users
>     >
>     > ClusterLabs home: https://www.clusterlabs.org/
>     >
>
>     _______________________________________________
>     Manage your subscription:
>     https://lists.clusterlabs.org/mailman/listinfo/users
>
>     ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201214/01c32d50/attachment-0001.htm>