[ClusterLabs] Still Beginner STONITH Problem

Tue Jul 7 04:33:05 EDT 2020

I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20.

Your other option is to get an iSCSI from your quorum system and use that for SBD.
For watchdog, you can use 'softdog' kernel module or you can use KVM to present one to the VMs.
You can also check the '-P' flag for SBD.

Best Regards,
Strahil Nikolov

На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schmitz at farmpartner-tec.com" <stefan.schmitz at farmpartner-tec.com> написа:
> >What does 'virsh list'
> >give you onthe 2 hosts? Hopefully different names for
> >the VMs ...
>
>Yes, each host shows its own
>
># virsh list
>  Id    Name                           Status
>----------------------------------------------------
>  2     kvm101                         running
>
># virsh list
>  Id    Name                           State
>----------------------------------------------------
>  1     kvm102                         running
>
>
>
> >Did you try 'fence_xvm -a {mcast-ip} -o list' on the
> >guests as well?
>
>fence_xvm sadly does not work on the Ubuntu guests. The howto said to 
>install  "yum install fence-virt fence-virtd" which do not exist as
>such 
>in Ubuntu 18.04. After we tried to find the appropiate packages we 
>installed "libvirt-clients" and "multipath-tools". Is there maybe 
>something misisng or completely wrong?
>Though we can  connect to both hosts using "nc -z -v -u 192.168.1.21 
>1229", that just works fine.
>
>
> >Usually,  the biggest problem is the multicast traffic - as in many 
> >environments it can be dropped  by firewalls.
>
>To make sure I have requested our Datacenter techs to verify that 
>multicast Traffic can move unhindered in our local Network. But in the 
>past on multiple occasions they have confirmed, that local traffic is 
>not filtered in any way. But Since now I have never specifically asked 
>for multicast traffic, which I now did. I am waiting for an answer to 
>that question.
>
>
>kind regards
>Stefan Schmitz
>
>Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>> Hello,
>>>
>>>>> # fence_xvm -o list
>>>>> kvm102                          
>bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>> on
>>>
>>>> This should show both VMs, so getting to that point will likely
>solve
>>>> your problem. fence_xvm relies on multicast, there could be some
>>>> obscure network configuration to get that working on the VMs.
>> You said you tried on both hosts. What does 'virsh list'
>> give you onthe 2 hosts? Hopefully different names for
>> the VMs ...
>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>> guests as well?
>> Did you try pinging via the physical network that is
>> connected tothe bridge configured to be used for
>> fencing?
>> If I got it right fence_xvm should supportcollecting
>> answersfrom multiple hosts but I found a suggestion
>> to do a setup with 2 multicast-addresses & keys for
>> each host.
>> Which route did you go?
>> 
>> Klaus
>>>
>>> Thank you for pointing me in that direction. We have tried to solve
>>> that but with no success. We were using an howto provided here
>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>
>>> Problem is, it specifically states that the tutorial does not yet
>>> support the case where guests are running on multiple hosts. There
>are
>>> some short hints what might be necessary to do, but working through
>>> those sadly just did not work nor where there any clues which would
>>> help us finding a solution ourselves. So now we are completely stuck
>>> here.
>>>
>>> Has someone the same configuration with Guest VMs on multiple hosts?
>>> And how did you manage to get that to work? What do we need to do to
>>> resolve this? Is there maybe even someone who would be willing to
>take
>>> a closer look at our server? Any help would be greatly appreciated!
>>>
>>> Kind regards
>>> Stefan Schmitz
>>>
>>>
>>>
>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>> On Thu, 2020-07-02 at 17:18 +0200,
>stefan.schmitz at farmpartner-tec.com
>>>> wrote:
>>>>> Hello,
>>>>>
>>>>> I hope someone can help with this problem. We are (still) trying
>to
>>>>> get
>>>>> Stonith to achieve a running active/active HA Cluster, but sadly
>to
>>>>> no
>>>>> avail.
>>>>>
>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>VM.
>>>>> The
>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>
>>>>> The current status is this:
>>>>>
>>>>> # pcs status
>>>>> Cluster name: pacemaker_cluster
>>>>> WARNING: corosync and pacemaker node names do not match (IPs used
>in
>>>>> setup?)
>>>>> Stack: corosync
>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>>>>> with
>>>>> quorum
>>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>>> server4ubuntu1
>>>>>
>>>>> 2 nodes configured
>>>>> 13 resources configured
>>>>>
>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>
>>>>> Full list of resources:
>>>>>
>>>>>     stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>>     Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>         Masters: [ server4ubuntu1 ]
>>>>>         Slaves: [ server2ubuntu1 ]
>>>>>     Master/Slave Set: WebDataClone [WebData]
>>>>>         Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>     Clone Set: dlm-clone [dlm]
>>>>>         Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>     Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>         ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
>>>>> server2ubuntu1
>>>>>         ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
>>>>> server4ubuntu1
>>>>>     Clone Set: WebFS-clone [WebFS]
>>>>>         Started: [ server4ubuntu1 ]
>>>>>         Stopped: [ server2ubuntu1 ]
>>>>>     Clone Set: WebSite-clone [WebSite]
>>>>>         Started: [ server4ubuntu1 ]
>>>>>         Stopped: [ server2ubuntu1 ]
>>>>>
>>>>> Failed Actions:
>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>> call=201,
>>>>> status=Error, exitreason='',
>>>>>        last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>>>> exec=3403ms
>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>> call=203,
>>>>> status=complete, exitreason='',
>>>>>        last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms,
>exec=0ms
>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>> call=202,
>>>>> status=Error, exitreason='',
>>>>>        last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>>>> exec=3411ms
>>>>>
>>>>>
>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>> On both hosts the command
>>>>> # fence_xvm -o list
>>>>> kvm102                          
>bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>> on
>>>>
>>>> This should show both VMs, so getting to that point will likely
>solve
>>>> your problem. fence_xvm relies on multicast, there could be some
>>>> obscure network configuration to get that working on the VMs.
>>>>
>>>>> returns the local VM. Apparently it connects through the
>>>>> Virtualization
>>>>> interface because it returns the VM name not the Hostname of the
>>>>> client
>>>>> VM. I do not know if this is how it is supposed to work?
>>>>
>>>> Yes, fence_xvm knows only about the VM names.
>>>>
>>>> To get pacemaker to be able to use it for fencing the cluster
>nodes,
>>>> you have to add a pcmk_host_map parameter to the fencing resource.
>It
>>>> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>
>>>>> In the local network, every traffic is allowed. No firewall is
>>>>> locally
>>>>> active, just the connections leaving the local network are
>>>>> firewalled.
>>>>> Hence there are no coneection problems between the hosts and
>clients.
>>>>> For example we can succesfully connect from the clients to the
>Hosts:
>>>>>
>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>> Ncat: UDP packet sent successfully
>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>
>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>> Ncat: UDP packet sent successfully
>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>
>>>>>
>>>>> On the Ubuntu VMs we created and configured the the stonith
>resource
>>>>> according to the  howto provided here:
>>>>>
>https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>>
>>>>>
>>>>> The actual line we used:
>>>>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
>>>>> hostlist="Host4,host2"
>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>
>>>>>
>>>>> But as you can see in in the pcs status output, stonith is stopped
>>>>> and
>>>>> exits with an unkown error.
>>>>>
>>>>> Can somebody please advise on how to procced or what additionla
>>>>> information is needed to solve this problem?
>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>
>>>>> Kind regards
>>>>> Stefan Schmitz
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>