[ClusterLabs] Still Beginner STONITH Problem
Ken Gaillot
kgaillot at redhat.com
Thu Jul 2 20:39:38 EDT 2020
On Thu, 2020-07-02 at 17:18 +0200, stefan.schmitz at farmpartner-tec.com
wrote:
> Hello,
>
> I hope someone can help with this problem. We are (still) trying to
> get
> Stonith to achieve a running active/active HA Cluster, but sadly to
> no
> avail.
>
> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
> The
> Ubuntu VMs are the ones which should form the HA Cluster.
>
> The current status is this:
>
> # pcs status
> Cluster name: pacemaker_cluster
> WARNING: corosync and pacemaker node names do not match (IPs used in
> setup?)
> Stack: corosync
> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
> with
> quorum
> Last updated: Thu Jul 2 17:03:53 2020
> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on
> server4ubuntu1
>
> 2 nodes configured
> 13 resources configured
>
> Online: [ server2ubuntu1 server4ubuntu1 ]
>
> Full list of resources:
>
> stonith_id_1 (stonith:external/libvirt): Stopped
> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
> Masters: [ server4ubuntu1 ]
> Slaves: [ server2ubuntu1 ]
> Master/Slave Set: WebDataClone [WebData]
> Masters: [ server2ubuntu1 server4ubuntu1 ]
> Clone Set: dlm-clone [dlm]
> Started: [ server2ubuntu1 server4ubuntu1 ]
> Clone Set: ClusterIP-clone [ClusterIP] (unique)
> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started
> server2ubuntu1
> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started
> server4ubuntu1
> Clone Set: WebFS-clone [WebFS]
> Started: [ server4ubuntu1 ]
> Stopped: [ server2ubuntu1 ]
> Clone Set: WebSite-clone [WebSite]
> Started: [ server4ubuntu1 ]
> Stopped: [ server2ubuntu1 ]
>
> Failed Actions:
> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
> call=201,
> status=Error, exitreason='',
> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms,
> exec=3403ms
> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
> call=203,
> status=complete, exitreason='',
> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms
> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
> call=202,
> status=Error, exitreason='',
> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms,
> exec=3411ms
>
>
> The stonith resoursce is stopped and does not seem to work.
> On both hosts the command
> # fence_xvm -o list
> kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
> on
This should show both VMs, so getting to that point will likely solve
your problem. fence_xvm relies on multicast, there could be some
obscure network configuration to get that working on the VMs.
> returns the local VM. Apparently it connects through the
> Virtualization
> interface because it returns the VM name not the Hostname of the
> client
> VM. I do not know if this is how it is supposed to work?
Yes, fence_xvm knows only about the VM names.
To get pacemaker to be able to use it for fencing the cluster nodes,
you have to add a pcmk_host_map parameter to the fencing resource. It
looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
> In the local network, every traffic is allowed. No firewall is
> locally
> active, just the connections leaving the local network are
> firewalled.
> Hence there are no coneection problems between the hosts and clients.
> For example we can succesfully connect from the clients to the Hosts:
>
> # nc -z -v -u 192.168.1.21 1229
> Ncat: Version 7.50 ( https://nmap.org/ncat )
> Ncat: Connected to 192.168.1.21:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>
> # nc -z -v -u 192.168.1.13 1229
> Ncat: Version 7.50 ( https://nmap.org/ncat )
> Ncat: Connected to 192.168.1.13:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>
>
> On the Ubuntu VMs we created and configured the the stonith resource
> according to the howto provided here:
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>
> The actual line we used:
> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
> hostlist="Host4,host2"
> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>
>
> But as you can see in in the pcs status output, stonith is stopped
> and
> exits with an unkown error.
>
> Can somebody please advise on how to procced or what additionla
> information is needed to solve this problem?
> Any help would be greatly appreciated! Thank you in advance.
>
> Kind regards
> Stefan Schmitz
>
>
>
>
>
>
>
>
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list