[ClusterLabs] Still Beginner STONITH Problem

Thu Jul 2 20:39:38 EDT 2020

On Thu, 2020-07-02 at 17:18 +0200, stefan.schmitz at farmpartner-tec.com
wrote:
> Hello,
> 
> I hope someone can help with this problem. We are (still) trying to
> get 
> Stonith to achieve a running active/active HA Cluster, but sadly to
> no 
> avail.
> 
> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
> The 
> Ubuntu VMs are the ones which should form the HA Cluster.
> 
> The current status is this:
> 
> # pcs status
> Cluster name: pacemaker_cluster
> WARNING: corosync and pacemaker node names do not match (IPs used in
> setup?)
> Stack: corosync
> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
> with 
> quorum
> Last updated: Thu Jul  2 17:03:53 2020
> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
> server4ubuntu1
> 
> 2 nodes configured
> 13 resources configured
> 
> Online: [ server2ubuntu1 server4ubuntu1 ]
> 
> Full list of resources:
> 
>   stonith_id_1   (stonith:external/libvirt):     Stopped
>   Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>       Masters: [ server4ubuntu1 ]
>       Slaves: [ server2ubuntu1 ]
>   Master/Slave Set: WebDataClone [WebData]
>       Masters: [ server2ubuntu1 server4ubuntu1 ]
>   Clone Set: dlm-clone [dlm]
>       Started: [ server2ubuntu1 server4ubuntu1 ]
>   Clone Set: ClusterIP-clone [ClusterIP] (unique)
>       ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started 
> server2ubuntu1
>       ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started 
> server4ubuntu1
>   Clone Set: WebFS-clone [WebFS]
>       Started: [ server4ubuntu1 ]
>       Stopped: [ server2ubuntu1 ]
>   Clone Set: WebSite-clone [WebSite]
>       Started: [ server4ubuntu1 ]
>       Stopped: [ server2ubuntu1 ]
> 
> Failed Actions:
> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
> call=201, 
> status=Error, exitreason='',
>      last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
> exec=3403ms
> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
> call=203, 
> status=complete, exitreason='',
>      last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
> call=202, 
> status=Error, exitreason='',
>      last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
> exec=3411ms
> 
> 
> The stonith resoursce is stopped and does not seem to work.
> On both hosts the command
> # fence_xvm -o list
> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 
> on

This should show both VMs, so getting to that point will likely solve
your problem. fence_xvm relies on multicast, there could be some
obscure network configuration to get that working on the VMs.

> returns the local VM. Apparently it connects through the
> Virtualization 
> interface because it returns the VM name not the Hostname of the
> client 
> VM. I do not know if this is how it is supposed to work?

Yes, fence_xvm knows only about the VM names.

To get pacemaker to be able to use it for fencing the cluster nodes,
you have to add a pcmk_host_map parameter to the fencing resource. It
looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."

> In the local network, every traffic is allowed. No firewall is
> locally 
> active, just the connections leaving the local network are
> firewalled.
> Hence there are no coneection problems between the hosts and clients.
> For example we can succesfully connect from the clients to the Hosts:
> 
> # nc -z -v -u 192.168.1.21 1229
> Ncat: Version 7.50 ( https://nmap.org/ncat )
> Ncat: Connected to 192.168.1.21:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> # nc -z -v -u 192.168.1.13 1229
> Ncat: Version 7.50 ( https://nmap.org/ncat )
> Ncat: Connected to 192.168.1.13:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> 
> On the Ubuntu VMs we created and configured the the stonith resource 
> according to the  howto provided here:
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
> 
> The actual line we used:
> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt 
> hostlist="Host4,host2"
> hypervisor_uri="qemu+ssh://192.168.1.21/system"
> 
> 
> But as you can see in in the pcs status output, stonith is stopped
> and 
> exits with an unkown error.
> 
> Can somebody please advise on how to procced or what additionla 
> information is needed to solve this problem?
> Any help would be greatly appreciated! Thank you in advance.
> 
> Kind regards
> Stefan Schmitz
> 
> 
> 
> 
> 
> 
> 
> 
-- 
Ken Gaillot <kgaillot at redhat.com>