[ClusterLabs] Stonith configuration
Strahil Nikolov
hunter86_bg at yahoo.com
Fri Feb 14 08:58:30 EST 2020
On February 14, 2020 12:41:58 PM GMT+02:00, "BASDEN, ALASTAIR G." <a.g.basden at durham.ac.uk> wrote:
>Hi,
>I wonder whether anyone could give me some advice about a stonith
>configuration.
>
>We have 2 nodes, which form a HA cluster.
>
>These have 3 networks:
>A generic network over which they are accessed (eg ssh)
>(node1.primary.network, node2.primary.network)
>A directly connected cable between them (10.0.6.20, 10.0.6.21).
>A management network, on which ipmi is (172.16.150.20, 172.16.150.21)
>
>We have done:
>pcs cluster setup --name hacluster node1.primary.network,10.0.6.20
>node2.primary.network,10.0.6.21 --token 20000
>pcs cluster start --all
>pcs property set no-quorum-policy=ignore
>pcs property set stonith-enabled=true
>pcs property set symmetric-cluster=true
>pcs stonith create node1_ipmi fence_ipmilan ipaddr="172.16.150.20"
>lanplus=true login="root" passwd="password"
>pcmk_host_list="node1.primary.network" power_wait=10
>pcs stonith create node2_ipmi fence_ipmilan ipaddr="172.16.150.21"
>lanplus=true login="root" passwd="password"
>pcmk_host_list="node2.primary.network" power_wait=10
>
>/etc/corosync/corosync.conf has:
>totem {
> version: 2
> cluster_name: hacluster
> secauth: off
> transport: udpu
> rrp_mode: passive
> token: 20000
>}
>
>nodelist {
> node {
> ring0_addr: node1.primary.network
> ring1_addr: 10.0.6.20
> nodeid: 1
> }
>
> node {
> ring0_addr: node2.primary.network
> ring1_addr: 10.0.6.21
> nodeid: 2
> }
>}
>
>quorum {
> provider: corosync_votequorum
> two_node: 1
>}
>
>logging {
> to_logfile: yes
> logfile: /var/log/cluster/corosync.log
> to_syslog: no
>}
>
>
>What I find is that if there is a problem with the directly connected
>cable, the nodes stonith each other, even though the generic network is
>
>fine.
>
>What I would expect is that they would only shoot each other when both
>networks are down (generic and directly connected).
>
>Any ideas?
>
>Thanks,
>Alastair.
>_______________________________________________
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
What is the output of :
corosync-cfgtool -s
corosync-quorumtool -s
Also check the logs of the suvived node for clues.
What about firewall ?
Have you enabled 'high-availability' service on firewalld on all zones for your interfaces ?
Best Regards,
Strahil Nikolov
More information about the Users
mailing list