[ClusterLabs] Stonith configuration

Fri Feb 14 09:44:53 EST 2020

Hi Strahil,
corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
 	id	= 172.17.150.20
 	status	= ring 0 active with no faults
RING ID 1
 	id	= 10.0.6.20
 	status	= ring 1 active with no faults

corosync-quorumtool -s
Quorum information
------------------
Date:             Fri Feb 14 14:41:11 2020
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          1/96
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
     Nodeid      Votes Name
          1          1 node1.primary.network (local)
          2          1 node2.primary.network

On the surviving node, the 10.0.6.21 interface flipflopped (though nothing 
detected on the other node), and that is what started it all off.

We have no firewall running.

Cheers,
Alastair.

On Fri, 14 Feb 2020, Strahil Nikolov wrote:

> On February 14, 2020 12:41:58 PM GMT+02:00, "BASDEN, ALASTAIR G." <a.g.basden at durham.ac.uk> wrote:
>> Hi,
>> I wonder whether anyone could give me some advice about a stonith
>> configuration.
>>
>> We have 2 nodes, which form a HA cluster.
>>
>> These have 3 networks:
>> A generic network over which they are accessed (eg ssh)
>> (node1.primary.network, node2.primary.network)
>> A directly connected cable between them (10.0.6.20, 10.0.6.21).
>> A management network, on which ipmi is (172.16.150.20, 172.16.150.21)
>>
>> We have done:
>> pcs cluster setup --name hacluster node1.primary.network,10.0.6.20
>> node2.primary.network,10.0.6.21 --token 20000
>> pcs cluster start --all
>> pcs property set no-quorum-policy=ignore
>> pcs property set stonith-enabled=true
>> pcs property set symmetric-cluster=true
>> pcs stonith create node1_ipmi fence_ipmilan ipaddr="172.16.150.20"
>> lanplus=true login="root" passwd="password"
>> pcmk_host_list="node1.primary.network" power_wait=10
>> pcs stonith create node2_ipmi fence_ipmilan ipaddr="172.16.150.21"
>> lanplus=true login="root" passwd="password"
>> pcmk_host_list="node2.primary.network" power_wait=10
>>
>> /etc/corosync/corosync.conf has:
>> totem {
>>     version: 2
>>     cluster_name: hacluster
>>     secauth: off
>>     transport: udpu
>>     rrp_mode: passive
>>     token: 20000
>> }
>>
>> nodelist {
>>     node {
>>         ring0_addr: node1.primary.network
>>         ring1_addr: 10.0.6.20
>>         nodeid: 1
>>     }
>>
>>     node {
>>         ring0_addr: node2.primary.network
>>         ring1_addr: 10.0.6.21
>>          nodeid: 2
>>     }
>> }
>>
>> quorum {
>>     provider: corosync_votequorum
>>     two_node: 1
>> }
>>
>> logging {
>>     to_logfile: yes
>>     logfile: /var/log/cluster/corosync.log
>>     to_syslog: no
>> }
>>
>>
>> What I find is that if there is a problem with the directly connected
>> cable, the nodes stonith each other, even though the generic network is
>>
>> fine.
>>
>> What I would expect is that they would only shoot each other when both
>> networks are down (generic and directly connected).
>>
>> Any ideas?
>>
>> Thanks,
>> Alastair.
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
> What is  the output of :
> corosync-cfgtool -s
> corosync-quorumtool -s
>
> Also check the logs of the suvived node for clues.
>
> What about firewall ?
> Have you enabled 'high-availability' service on firewalld on all zones for your interfaces ?
>
> Best Regards,
> Strahil Nikolov
>
>