<html><header></header><body><div style="font-family: tahoma,arial,helvetica,sans-serif; font-size: 14px;">I isolated the log when everything happens (when I disable the ha interface), attached here.</div>

<div style="font-family: tahoma,arial,helvetica,sans-serif; font-size: 14px;"> </div>

<div style="font-family: tahoma,arial,helvetica,sans-serif; font-size: 14px;">Gabriele</div>

<div style="font-family: tahoma,arial,helvetica,sans-serif; font-size: 14px;"> </div>

<div id="wt-mailcard">

<div> </div>

<div><span style="font-size: 14px; font-family: Helvetica;"><strong>Sonicle S.r.l. </strong>: <a href="http://www.sonicle.com/" target="_new">http://www.sonicle.com</a></span></div>

<div><span style="font-size: 14px; font-family: Helvetica;"><strong>Music: </strong><a href="http://www.gabrielebulfon.com/" target="_new">http://www.gabrielebulfon.com</a></span></div>

<div><span style="font-size: 14px; font-family: Helvetica;"><strong>eXoplanets : </strong><a href="https://gabrielebulfon.bandcamp.com/album/exoplanets">https://gabrielebulfon.bandcamp.com/album/exoplanets</a></span></div>

<div> </div>

</div>

<div style="font-family: tahoma,arial,helvetica,sans-serif; font-size: 14px;"><tt><br /><br /><br />----------------------------------------------------------------------------------<br /><br />Da: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de><br />A: users@clusterlabs.org <br />Data: 14 dicembre 2020 11.53.22 CET<br />Oggetto: [ClusterLabs] Antw: Re: Antw: [EXT] Recoveing from node failure<br /><br /></tt></div>

<blockquote style="border-left: #000080 2px solid; margin-left: 5px; padding-left: 5px;"><tt>>>> Gabriele Bulfon <gbulfon@sonicle.com> schrieb am 14.12.2020 um 11:48 in<br />Nachricht <1065144646.7212.1607942889206@www>:<br />> Thanks!<br />> <br />> I tried first option, by adding pcmk_delay_base to the two stonith <br />> primitives.<br />> First has 1 second, second has 5 seconds.<br />> It didn't work :( they still killed each other :(<br />> Anything wrong with the way I did it?<br /><br />Hard to say without seeing the logs...<br /><br />> <br />> Here's the config:<br />> <br />> node 1: xstha1 \<br />> attributes standby=off maintenance=off<br />> node 2: xstha2 \<br />> attributes standby=off maintenance=off<br />> primitive xstha1-stonith stonith:external/ipmi \<br />> params hostname=xstha1 ipaddr=192.168.221.18 userid=ADMIN <br />> passwd="***" interface=lanplus pcmk_delay_base=1 \<br />> op monitor interval=25 timeout=25 start-delay=25 \<br />> meta target-role=Started<br />> primitive xstha1_san0_IP IPaddr \<br />> params ip=10.10.10.1 cidr_netmask=255.255.255.0 nic=san0<br />> primitive xstha2-stonith stonith:external/ipmi \<br />> params hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN <br />> passwd="***" interface=lanplus pcmk_delay_base=5 \<br />> op monitor interval=25 timeout=25 start-delay=25 \<br />> meta target-role=Started<br />> primitive xstha2_san0_IP IPaddr \<br />> params ip=10.10.10.2 cidr_netmask=255.255.255.0 nic=san0<br />> primitive zpool_data ZFS \<br />> params pool=test \<br />> op start timeout=90 interval=0 \<br />> op stop timeout=90 interval=0 \<br />> meta target-role=Started<br />> location xstha1-stonith-pref xstha1-stonith -inf: xstha1<br />> location xstha1_san0_IP_pref xstha1_san0_IP 100: xstha1<br />> location xstha2-stonith-pref xstha2-stonith -inf: xstha2<br />> location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2<br />> order zpool_data_order inf: zpool_data ( xstha1_san0_IP )<br />> location zpool_data_pref zpool_data 100: xstha1<br />> colocation zpool_data_with_IPs inf: zpool_data xstha1_san0_IP<br />> property cib-bootstrap-options: \<br />> have-watchdog=false \<br />> dc-version=1.1.15-e174ec8 \<br />> cluster-infrastructure=corosync \<br />> stonith-action=poweroff \<br />> no-quorum-policy=stop<br />> <br />> <br />> Sonicle S.r.l. : http://www.sonicle.com <br />> Music: http://www.gabrielebulfon.com <br />> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets <br />> <br />> <br />> <br />> <br />> <br />><br />----------------------------------------------------------------------------<br />> ------<br />> <br />> Da: Andrei Borzenkov <arvidjaar@gmail.com><br />> A: users@clusterlabs.org <br />> Data: 13 dicembre 2020 7.50.57 CET<br />> Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure<br />> <br />> <br />> 12.12.2020 20:30, Gabriele Bulfon пишет:<br />>> Thanks, I will experiment this.<br />>> <br />>> Now, I have a last issue about stonith.<br />>> I tried to reproduce a stonith situation, by disabling the network<br />interface <br />> used for HA on node 1.<br />>> Stonith is configured with ipmi poweroff.<br />>> What happens, is that once the interface is down, both nodes tries to <br />> stonith the other node, causing both to poweroff...<br />> <br />> Yes, this is expected. The options are basically<br />> <br />> 1. Have separate stonith resource for each node and configure static<br />> (pcmk_delay_base) or random dynamic (pcmk_delay_max) delays to avoid<br />> both nodes starting stonith at the same time. This does not take<br />> resources in account.<br />> <br />> 2. Use fencing topology and create pseudo-stonith agent that does not<br />> attempt to do anything but just delays for some time before continuing<br />> with actual fencing agent. Delay can be based on anything including<br />> resources running on node.<br />> <br />> 3. If you are using pacemaker 2.0.3+, you could use new<br />> priority-fencing-delay feature that implements resource-based priority<br />> fencing:<br />> <br />> + controller/fencing/scheduler: add new feature 'priority-fencing-delay'<br />> Optionally derive the priority of a node from the<br />> resource-priorities<br />> of the resources it is running.<br />> In a fencing-race the node with the highest priority has a certain<br />> advantage over the others as fencing requests for that node are<br />> executed with an additional delay.<br />> controlled via cluster option priority-fencing-delay (default = 0)<br />> <br />> <br />> See also https://www.mail-archive.com/users@clusterlabs.org/msg10328.html <br />> <br />>> I would like the node running all resources (zpool and nfs ip) to be the <br />> first trying to stonith the other node.<br />>> Or is there anything else better?<br />>> <br />>> Here is the current crm config show:<br />>> <br />> <br />> It is unreadable<br />> <br />>> node 1: xstha1 \ attributes standby=off maintenance=offnode 2: xstha2 \ <br />> attributes standby=off maintenance=offprimitive xstha1-stonith <br />> stonith:external/ipmi \ params hostname=xstha1 ipaddr=192.168.221.18 <br />> userid=ADMIN passwd="******" interface=lanplus \ op monitor interval=25 <br />> timeout=25 start-delay=25 \ meta target-role=Startedprimitive xstha1_san0_IP<br /><br />> IPaddr \ params ip=10.10.10.1 cidr_netmask=255.255.255.0 nic=san0primitive <br />> xstha2-stonith stonith:external/ipmi \ params hostname=xstha2 <br />> ipaddr=192.168.221.19 userid=ADMIN passwd="******" interface=lanplus \ op <br />> monitor interval=25 timeout=25 start-delay=25 \ meta <br />> target-role=Startedprimitive xstha2_san0_IP IPaddr \ params ip=10.10.10.2 <br />> cidr_netmask=255.255.255.0 nic=san0primitive zpool_data ZFS \ params <br />> pool=test \ op start timeout=90 interval=0 \ op stop timeout=90 interval=0 \<br /><br />> meta target-role=Startedlocation xstha1-stonith-pref xstha1-stonith -inf: <br />> xstha1location xstha1_san0_IP_pref xstha1_san0_IP 100: xstha1location <br />> xstha2-stonith-pref xstha2-stonith -inf: xstha2location xstha2_san0_IP_pref<br /><br />> xstha2_san0_IP 100: xstha2order zpool_data_order inf: zpool_data ( <br />> xstha1_san0_IP )location zpool_data_pref zpool_data 100: xstha1colocation <br />> zpool_data_with_IPs inf: zpool_data xstha1_san0_IPproperty <br />> cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-e174ec8 \ <br />> cluster-infrastructure=corosync \ stonith-action=poweroff \ <br />> no-quorum-policy=stop<br />>> <br />>> Thanks!<br />>> Gabriele<br />>> <br />>> <br />>> Sonicle S.r.l. : http://www.sonicle.com <br />>> Music: http://www.gabrielebulfon.com <br />>> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets <br />>> <br />>> <br />>> <br />>> <br />>> <br />>> <br />><br />-----------------------------------------------------------------------------<br />> -----<br />>> <br />>> Da: Andrei Borzenkov <arvidjaar@gmail.com><br />>> A: users@clusterlabs.org <br />>> Data: 11 dicembre 2020 18.30.29 CET<br />>> Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure<br />>> <br />>> <br />>> 11.12.2020 18:37, Gabriele Bulfon пишет:<br />>>> I found I can do this temporarily:<br />>>> <br />>>> crm config property cib-bootstrap-options: no-quorum-policy=ignore<br />>>> <br />>> <br />>> All two node clusters I remember run with setting forever :)<br />>> <br />>>> then once node 2 is up again:<br />>>> <br />>>> crm config property cib-bootstrap-options: no-quorum-policy=stop<br />>>> <br />>>> so that I make sure nodes will not mount in another strange situation.<br />>>> <br />>>> Is there any better way? <br />>> <br />>> "better" us subjective, but ...<br />>> <br />>>> (such as ignore until everything is back to normal then conisder top<br />again)<br />>>> <br />>> <br />>> That is what stonith does. Because quorum is pretty much useless in two<br />>> node cluster, as I already said all clusters I have seem used<br />>> no-quorum-policy=ignore and stonith-enabled=true. It means when node<br />>> boots and other node is not available stonith is attempted; if stonith<br />>> succeeds pacemaker continues with starting resources; if stonith fails,<br />>> node is stuck.<br />>> <br />>> _______________________________________________<br />>> Manage your subscription:<br />>> https://lists.clusterlabs.org/mailman/listinfo/users <br />>> <br />>> ClusterLabs home: https://www.clusterlabs.org/ <br />>> <br />>> <br />>> <br />>> <br />>> _______________________________________________<br />>> Manage your subscription:<br />>> https://lists.clusterlabs.org/mailman/listinfo/users <br />>> <br />>> ClusterLabs home: https://www.clusterlabs.org/ <br />>> <br />> <br />> _______________________________________________<br />> Manage your subscription:<br />> https://lists.clusterlabs.org/mailman/listinfo/users <br />> <br />> ClusterLabs home: https://www.clusterlabs.org/ <br /><br /><br /><br />_______________________________________________<br />Manage your subscription:<br />https://lists.clusterlabs.org/mailman/listinfo/users<br /><br />ClusterLabs home: https://www.clusterlabs.org/<br /><br /><br /></tt></blockquote></body></html>