[ClusterLabs] Still Beginner STONITH Problem

Tue Jul 7 03:11:38 EDT 2020

 >What does 'virsh list'
 >give you onthe 2 hosts? Hopefully different names for
 >the VMs ...

Yes, each host shows its own

# virsh list
  Id    Name                           Status
----------------------------------------------------
  2     kvm101                         running

# virsh list
  Id    Name                           State
----------------------------------------------------
  1     kvm102                         running

 >Did you try 'fence_xvm -a {mcast-ip} -o list' on the
 >guests as well?

fence_xvm sadly does not work on the Ubuntu guests. The howto said to 
install  "yum install fence-virt fence-virtd" which do not exist as such 
in Ubuntu 18.04. After we tried to find the appropiate packages we 
installed "libvirt-clients" and "multipath-tools". Is there maybe 
something misisng or completely wrong?
Though we can  connect to both hosts using "nc -z -v -u 192.168.1.21 
1229", that just works fine.

 >Usually,  the biggest problem is the multicast traffic - as in many 
 >environments it can be dropped  by firewalls.

To make sure I have requested our Datacenter techs to verify that 
multicast Traffic can move unhindered in our local Network. But in the 
past on multiple occasions they have confirmed, that local traffic is 
not filtered in any way. But Since now I have never specifically asked 
for multicast traffic, which I now did. I am waiting for an answer to 
that question.

kind regards
Stefan Schmitz

Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>> Hello,
>>
>>>> # fence_xvm -o list
>>>> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>> on
>>
>>> This should show both VMs, so getting to that point will likely solve
>>> your problem. fence_xvm relies on multicast, there could be some
>>> obscure network configuration to get that working on the VMs.
> You said you tried on both hosts. What does 'virsh list'
> give you onthe 2 hosts? Hopefully different names for
> the VMs ...
> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
> guests as well?
> Did you try pinging via the physical network that is
> connected tothe bridge configured to be used for
> fencing?
> If I got it right fence_xvm should supportcollecting
> answersfrom multiple hosts but I found a suggestion
> to do a setup with 2 multicast-addresses & keys for
> each host.
> Which route did you go?
> 
> Klaus
>>
>> Thank you for pointing me in that direction. We have tried to solve
>> that but with no success. We were using an howto provided here
>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>
>> Problem is, it specifically states that the tutorial does not yet
>> support the case where guests are running on multiple hosts. There are
>> some short hints what might be necessary to do, but working through
>> those sadly just did not work nor where there any clues which would
>> help us finding a solution ourselves. So now we are completely stuck
>> here.
>>
>> Has someone the same configuration with Guest VMs on multiple hosts?
>> And how did you manage to get that to work? What do we need to do to
>> resolve this? Is there maybe even someone who would be willing to take
>> a closer look at our server? Any help would be greatly appreciated!
>>
>> Kind regards
>> Stefan Schmitz
>>
>>
>>
>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>> On Thu, 2020-07-02 at 17:18 +0200, stefan.schmitz at farmpartner-tec.com
>>> wrote:
>>>> Hello,
>>>>
>>>> I hope someone can help with this problem. We are (still) trying to
>>>> get
>>>> Stonith to achieve a running active/active HA Cluster, but sadly to
>>>> no
>>>> avail.
>>>>
>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
>>>> The
>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>
>>>> The current status is this:
>>>>
>>>> # pcs status
>>>> Cluster name: pacemaker_cluster
>>>> WARNING: corosync and pacemaker node names do not match (IPs used in
>>>> setup?)
>>>> Stack: corosync
>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>>>> with
>>>> quorum
>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>> server4ubuntu1
>>>>
>>>> 2 nodes configured
>>>> 13 resources configured
>>>>
>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>
>>>> Full list of resources:
>>>>
>>>>     stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>     Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>         Masters: [ server4ubuntu1 ]
>>>>         Slaves: [ server2ubuntu1 ]
>>>>     Master/Slave Set: WebDataClone [WebData]
>>>>         Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>     Clone Set: dlm-clone [dlm]
>>>>         Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>     Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>         ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
>>>> server2ubuntu1
>>>>         ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
>>>> server4ubuntu1
>>>>     Clone Set: WebFS-clone [WebFS]
>>>>         Started: [ server4ubuntu1 ]
>>>>         Stopped: [ server2ubuntu1 ]
>>>>     Clone Set: WebSite-clone [WebSite]
>>>>         Started: [ server4ubuntu1 ]
>>>>         Stopped: [ server2ubuntu1 ]
>>>>
>>>> Failed Actions:
>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>> call=201,
>>>> status=Error, exitreason='',
>>>>        last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>>> exec=3403ms
>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>> call=203,
>>>> status=complete, exitreason='',
>>>>        last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>> call=202,
>>>> status=Error, exitreason='',
>>>>        last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>>> exec=3411ms
>>>>
>>>>
>>>> The stonith resoursce is stopped and does not seem to work.
>>>> On both hosts the command
>>>> # fence_xvm -o list
>>>> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>> on
>>>
>>> This should show both VMs, so getting to that point will likely solve
>>> your problem. fence_xvm relies on multicast, there could be some
>>> obscure network configuration to get that working on the VMs.
>>>
>>>> returns the local VM. Apparently it connects through the
>>>> Virtualization
>>>> interface because it returns the VM name not the Hostname of the
>>>> client
>>>> VM. I do not know if this is how it is supposed to work?
>>>
>>> Yes, fence_xvm knows only about the VM names.
>>>
>>> To get pacemaker to be able to use it for fencing the cluster nodes,
>>> you have to add a pcmk_host_map parameter to the fencing resource. It
>>> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>
>>>> In the local network, every traffic is allowed. No firewall is
>>>> locally
>>>> active, just the connections leaving the local network are
>>>> firewalled.
>>>> Hence there are no coneection problems between the hosts and clients.
>>>> For example we can succesfully connect from the clients to the Hosts:
>>>>
>>>> # nc -z -v -u 192.168.1.21 1229
>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>> Ncat: Connected to 192.168.1.21:1229.
>>>> Ncat: UDP packet sent successfully
>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>
>>>> # nc -z -v -u 192.168.1.13 1229
>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>> Ncat: Connected to 192.168.1.13:1229.
>>>> Ncat: UDP packet sent successfully
>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>
>>>>
>>>> On the Ubuntu VMs we created and configured the the stonith resource
>>>> according to the  howto provided here:
>>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>
>>>>
>>>> The actual line we used:
>>>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
>>>> hostlist="Host4,host2"
>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>
>>>>
>>>> But as you can see in in the pcs status output, stonith is stopped
>>>> and
>>>> exits with an unkown error.
>>>>
>>>> Can somebody please advise on how to procced or what additionla
>>>> information is needed to solve this problem?
>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>
>>>> Kind regards
>>>> Stefan Schmitz
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>