[ClusterLabs] Antw: [EXT] Recoveing from node failure
Klaus Wenninger
kwenning at redhat.com
Mon Dec 14 07:34:34 EST 2020
On 12/14/20 11:48 AM, Gabriele Bulfon wrote:
>
> Thanks!
>
> I tried first option, by adding pcmk_delay_base to the two stonith
> primitives.
> First has 1 second, second has 5 seconds.
> It didn't work :( they still killed each other :(
> Anything wrong with the way I did it?
Maybe 4s difference is a bit close.
Did you try anything significantly larger?
Klaus
>
> Here's the config:
>
> node 1: xstha1 \
> attributes standby=off maintenance=off
> node 2: xstha2 \
> attributes standby=off maintenance=off
> primitive xstha1-stonith stonith:external/ipmi \
> params hostname=xstha1 ipaddr=192.168.221.18 userid=ADMIN
> passwd="***" interface=lanplus pcmk_delay_base=1 \
> op monitor interval=25 timeout=25 start-delay=25 \
> meta target-role=Started
> primitive xstha1_san0_IP IPaddr \
> params ip=10.10.10.1 cidr_netmask=255.255.255.0 nic=san0
> primitive xstha2-stonith stonith:external/ipmi \
> params hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN
> passwd="***" interface=lanplus pcmk_delay_base=5 \
> op monitor interval=25 timeout=25 start-delay=25 \
> meta target-role=Started
> primitive xstha2_san0_IP IPaddr \
> params ip=10.10.10.2 cidr_netmask=255.255.255.0 nic=san0
> primitive zpool_data ZFS \
> params pool=test \
> op start timeout=90 interval=0 \
> op stop timeout=90 interval=0 \
> meta target-role=Started
> location xstha1-stonith-pref xstha1-stonith -inf: xstha1
> location xstha1_san0_IP_pref xstha1_san0_IP 100: xstha1
> location xstha2-stonith-pref xstha2-stonith -inf: xstha2
> location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2
> order zpool_data_order inf: zpool_data ( xstha1_san0_IP )
> location zpool_data_pref zpool_data 100: xstha1
> colocation zpool_data_with_IPs inf: zpool_data xstha1_san0_IP
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=1.1.15-e174ec8 \
> cluster-infrastructure=corosync \
> stonith-action=poweroff \
> no-quorum-policy=stop
>
>
> *Sonicle S.r.l. *: http://www.sonicle.com <http://www.sonicle.com/>
> *Music: *http://www.gabrielebulfon.com <http://www.gabrielebulfon.com/>
> *eXoplanets : *https://gabrielebulfon.bandcamp.com/album/exoplanets
>
>
>
>
> ----------------------------------------------------------------------------------
>
> Da: Andrei Borzenkov <arvidjaar at gmail.com>
> A: users at clusterlabs.org
> Data: 13 dicembre 2020 7.50.57 CET
> Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure
>
> 12.12.2020 20:30, Gabriele Bulfon пишет:
> > Thanks, I will experiment this.
> >
> > Now, I have a last issue about stonith.
> > I tried to reproduce a stonith situation, by disabling the
> network interface used for HA on node 1.
> > Stonith is configured with ipmi poweroff.
> > What happens, is that once the interface is down, both nodes
> tries to stonith the other node, causing both to poweroff...
>
> Yes, this is expected. The options are basically
>
> 1. Have separate stonith resource for each node and configure static
> (pcmk_delay_base) or random dynamic (pcmk_delay_max) delays to avoid
> both nodes starting stonith at the same time. This does not take
> resources in account.
>
> 2. Use fencing topology and create pseudo-stonith agent that does not
> attempt to do anything but just delays for some time before continuing
> with actual fencing agent. Delay can be based on anything including
> resources running on node.
>
> 3. If you are using pacemaker 2.0.3+, you could use new
> priority-fencing-delay feature that implements resource-based priority
> fencing:
>
> + controller/fencing/scheduler: add new feature
> 'priority-fencing-delay'
> Optionally derive the priority of a node from the
> resource-priorities
> of the resources it is running.
> In a fencing-race the node with the highest priority has a certain
> advantage over the others as fencing requests for that node are
> executed with an additional delay.
> controlled via cluster option priority-fencing-delay (default = 0)
>
>
> See also
> https://www.mail-archive.com/users@clusterlabs.org/msg10328.html
>
> > I would like the node running all resources (zpool and nfs ip)
> to be the first trying to stonith the other node.
> > Or is there anything else better?
> >
> > Here is the current crm config show:
> >
>
> It is unreadable
>
> > node 1: xstha1 \ attributes standby=off maintenance=offnode 2:
> xstha2 \ attributes standby=off maintenance=offprimitive
> xstha1-stonith stonith:external/ipmi \ params hostname=xstha1
> ipaddr=192.168.221.18 userid=ADMIN passwd="******"
> interface=lanplus \ op monitor interval=25 timeout=25
> start-delay=25 \ meta target-role=Startedprimitive xstha1_san0_IP
> IPaddr \ params ip=10.10.10.1 cidr_netmask=255.255.255.0
> nic=san0primitive xstha2-stonith stonith:external/ipmi \ params
> hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN passwd="******"
> interface=lanplus \ op monitor interval=25 timeout=25
> start-delay=25 \ meta target-role=Startedprimitive xstha2_san0_IP
> IPaddr \ params ip=10.10.10.2 cidr_netmask=255.255.255.0
> nic=san0primitive zpool_data ZFS \ params pool=test \ op start
> timeout=90 interval=0 \ op stop timeout=90 interval=0 \ meta
> target-role=Startedlocation xstha1-stonith-pref xstha1-stonith
> -inf: xstha1location xstha1_san0_IP_pref xstha1_san0_IP 100:
> xstha1location xstha2-stonith-pref xstha2-stonith -inf:
> xstha2location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2order
> zpool_data_order inf: zpool_data ( xstha1_san0_IP )location
> zpool_data_pref zpool_data 100: xstha1colocation
> zpool_data_with_IPs inf: zpool_data xstha1_san0_IPproperty
> cib-bootstrap-options: \ have-watchdog=false \
> dc-version=1.1.15-e174ec8 \ cluster-infrastructure=corosync \
> stonith-action=poweroff \ no-quorum-policy=stop
> >
> > Thanks!
> > Gabriele
> >
> >
> > Sonicle S.r.l. : http://www.sonicle.com
> > Music: http://www.gabrielebulfon.com
> > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
> >
> >
> >
> >
> >
> >
> ----------------------------------------------------------------------------------
> >
> > Da: Andrei Borzenkov <arvidjaar at gmail.com>
> > A: users at clusterlabs.org
> > Data: 11 dicembre 2020 18.30.29 CET
> > Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure
> >
> >
> > 11.12.2020 18:37, Gabriele Bulfon пишет:
> >> I found I can do this temporarily:
> >>
> >> crm config property cib-bootstrap-options: no-quorum-policy=ignore
> >>
> >
> > All two node clusters I remember run with setting forever :)
> >
> >> then once node 2 is up again:
> >>
> >> crm config property cib-bootstrap-options: no-quorum-policy=stop
> >>
> >> so that I make sure nodes will not mount in another strange
> situation.
> >>
> >> Is there any better way?
> >
> > "better" us subjective, but ...
> >
> >> (such as ignore until everything is back to normal then
> conisder top again)
> >>
> >
> > That is what stonith does. Because quorum is pretty much useless
> in two
> > node cluster, as I already said all clusters I have seem used
> > no-quorum-policy=ignore and stonith-enabled=true. It means when node
> > boots and other node is not available stonith is attempted; if
> stonith
> > succeeds pacemaker continues with starting resources; if stonith
> fails,
> > node is stuck.
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
> >
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20201214/01c32d50/attachment-0001.htm>
More information about the Users
mailing list