[ClusterLabs] Still Beginner STONITH Problem

Thu Jul 9 14:18:08 EDT 2020

Hi.

This thread is getting too long.

First, you need to ensure that your switch (or all switches in the
path) have igmp snooping enabled on host ports (and probably
interconnects along the path between your hosts).

Second, you need an igmp querier to be enabled somewhere near (better
to have it enabled on a switch itself). Please verify that you see its
queries on hosts.

Next, you probably need to make your hosts to use IGMPv2 (not 3) as
many switches still can not understand v3. This is doable by sysctl,
find on internet, there are many articles.

These advices are also applicable for running corosync itself in
multicast mode.

Best,
Vladislav

Thu, 02/07/2020 в 17:18 +0200, stefan.schmitz at farmpartner-tec.com
wrote:
> Hello,
> 
> I hope someone can help with this problem. We are (still) trying to
> get 
> Stonith to achieve a running active/active HA Cluster, but sadly to
> no 
> avail.
> 
> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
> The 
> Ubuntu VMs are the ones which should form the HA Cluster.
> 
> The current status is this:
> 
> # pcs status
> Cluster name: pacemaker_cluster
> WARNING: corosync and pacemaker node names do not match (IPs used in
> setup?)
> Stack: corosync
> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
> with 
> quorum
> Last updated: Thu Jul  2 17:03:53 2020
> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
> server4ubuntu1
> 
> 2 nodes configured
> 13 resources configured
> 
> Online: [ server2ubuntu1 server4ubuntu1 ]
> 
> Full list of resources:
> 
>   stonith_id_1   (stonith:external/libvirt):     Stopped
>   Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>       Masters: [ server4ubuntu1 ]
>       Slaves: [ server2ubuntu1 ]
>   Master/Slave Set: WebDataClone [WebData]
>       Masters: [ server2ubuntu1 server4ubuntu1 ]
>   Clone Set: dlm-clone [dlm]
>       Started: [ server2ubuntu1 server4ubuntu1 ]
>   Clone Set: ClusterIP-clone [ClusterIP] (unique)
>       ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started 
> server2ubuntu1
>       ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started 
> server4ubuntu1
>   Clone Set: WebFS-clone [WebFS]
>       Started: [ server4ubuntu1 ]
>       Stopped: [ server2ubuntu1 ]
>   Clone Set: WebSite-clone [WebSite]
>       Started: [ server4ubuntu1 ]
>       Stopped: [ server2ubuntu1 ]
> 
> Failed Actions:
> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
> call=201, 
> status=Error, exitreason='',
>      last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
> exec=3403ms
> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
> call=203, 
> status=complete, exitreason='',
>      last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
> call=202, 
> status=Error, exitreason='',
>      last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
> exec=3411ms
> 
> 
> The stonith resoursce is stopped and does not seem to work.
> On both hosts the command
> # fence_xvm -o list
> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 
> on
> 
> returns the local VM. Apparently it connects through the
> Virtualization 
> interface because it returns the VM name not the Hostname of the
> client 
> VM. I do not know if this is how it is supposed to work?
> 
> In the local network, every traffic is allowed. No firewall is
> locally 
> active, just the connections leaving the local network are
> firewalled.
> Hence there are no coneection problems between the hosts and clients.
> For example we can succesfully connect from the clients to the Hosts:
> 
> # nc -z -v -u 192.168.1.21 1229
> Ncat: Version 7.50 ( 
> https://nmap.org/ncat
>  )
> Ncat: Connected to 192.168.1.21:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> # nc -z -v -u 192.168.1.13 1229
> Ncat: Version 7.50 ( 
> https://nmap.org/ncat
>  )
> Ncat: Connected to 192.168.1.13:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> 
> On the Ubuntu VMs we created and configured the the stonith resource 
> according to the  howto provided here:
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
> 
> 
> The actual line we used:
> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt 
> hostlist="Host4,host2"
> hypervisor_uri="qemu+ssh://192.168.1.21/system"
> 
> 
> But as you can see in in the pcs status output, stonith is stopped
> and 
> exits with an unkown error.
> 
> Can somebody please advise on how to procced or what additionla 
> information is needed to solve this problem?
> Any help would be greatly appreciated! Thank you in advance.
> 
> Kind regards
> Stefan Schmitz
> 
> 
> 
> 
> 
> 
> 
>