[ClusterLabs] Still Beginner STONITH Problem

Andrei Borzenkov arvidjaar at gmail.com
Sun Jul 19 03:32:51 EDT 2020


02.07.2020 18:18, stefan.schmitz at farmpartner-tec.com пишет:
> Hello,
> 
> I hope someone can help with this problem. We are (still) trying to get
> Stonith to achieve a running active/active HA Cluster, but sadly to no
> avail.
> 
> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. The
> Ubuntu VMs are the ones which should form the HA Cluster.
> 
> The current status is this:
> 
> # pcs status
> Cluster name: pacemaker_cluster
> WARNING: corosync and pacemaker node names do not match (IPs used in
> setup?)
> Stack: corosync
> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition with
> quorum
> Last updated: Thu Jul  2 17:03:53 2020
> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
> server4ubuntu1
> 
> 2 nodes configured
> 13 resources configured
> 
> Online: [ server2ubuntu1 server4ubuntu1 ]
> 
> Full list of resources:
> 
>  stonith_id_1   (stonith:external/libvirt):     Stopped

external/libvirt is unrelated to fence_xvm

>  Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>      Masters: [ server4ubuntu1 ]
>      Slaves: [ server2ubuntu1 ]
>  Master/Slave Set: WebDataClone [WebData]
>      Masters: [ server2ubuntu1 server4ubuntu1 ]
>  Clone Set: dlm-clone [dlm]
>      Started: [ server2ubuntu1 server4ubuntu1 ]
>  Clone Set: ClusterIP-clone [ClusterIP] (unique)
>      ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
> server2ubuntu1
>      ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
> server4ubuntu1
>  Clone Set: WebFS-clone [WebFS]
>      Started: [ server4ubuntu1 ]
>      Stopped: [ server2ubuntu1 ]
>  Clone Set: WebSite-clone [WebSite]
>      Started: [ server4ubuntu1 ]
>      Stopped: [ server2ubuntu1 ]
> 
> Failed Actions:
> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): call=201,
> status=Error, exitreason='',
>     last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms, exec=3403ms
> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8): call=203,
> status=complete, exitreason='',
>     last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): call=202,
> status=Error, exitreason='',
>     last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms, exec=3411ms
> 
> 
> The stonith resoursce is stopped and does not seem to work.
> On both hosts the command
> # fence_xvm -o list
> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on
> 
> returns the local VM. Apparently it connects through the Virtualization
> interface because it returns the VM name not the Hostname of the client
> VM. I do not know if this is how it is supposed to work?
> 

fence_xvm opens TCP listening socket, sends request and waits for
connection to this socket (from fence_virtd) which is used to submit
actual fencing operation. Only the first connection request is handled.
So first host that responds will be processed. Local host is likely
always faster to respond than remote host.


> In the local network, every traffic is allowed. No firewall is locally
> active, just the connections leaving the local network are firewalled.
> Hence there are no coneection problems between the hosts and clients.
> For example we can succesfully connect from the clients to the Hosts:
> 
> # nc -z -v -u 192.168.1.21 1229
> Ncat: Version 7.50 ( https://nmap.org/ncat )
> Ncat: Connected to 192.168.1.21:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> # nc -z -v -u 192.168.1.13 1229
> Ncat: Version 7.50 ( https://nmap.org/ncat )
> Ncat: Connected to 192.168.1.13:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> 
> On the Ubuntu VMs we created and configured the the stonith resource
> according to the  howto provided here:
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
> 
> 
> The actual line we used:
> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
> hostlist="Host4,host2" hypervisor_uri="qemu+ssh://192.168.1.21/system"
> 

Again - external/libvirt is completely unrelated to fence_virt.

> 
> But as you can see in in the pcs status output, stonith is stopped and
> exits with an unkown error.
> 
> Can somebody please advise on how to procced or what additionla
> information is needed to solve this problem?
> Any help would be greatly appreciated! Thank you in advance.
> 
> Kind regards
> Stefan Schmitz
> 
> 
> 
> 
> 
> 
> 
> 



More information about the Users mailing list