[ClusterLabs] Still Beginner STONITH Problem

Strahil Nikolov hunter86_bg at yahoo.com
Wed Jul 8 14:24:12 EDT 2020


Erm...network/firewall  is  always "green". Run tcpdump on Host1  and VM2  (not  on the same host).
Then run again 'fence_xvm  -o  list' and check what is captured.

In summary,  you need:
-  key deployed on the Hypervisours
-  key deployed on the VMs
-  fence_virtd  running on both Hypervisours
-  Firewall opened (1229/udp  for the hosts,  1229/tcp  for the guests)
-  fence_xvm on both VMs

In your case  ,  the primary suspect  is multicast  traffic.

Best  Regards,
Strahil Nikolov

На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schmitz at farmpartner-tec.com" <stefan.schmitz at farmpartner-tec.com> написа:
>Hello,
>
>>I can't find fence_virtd for Ubuntu18, but it is available for
>>Ubuntu20.
>
>We have now upgraded our Server to Ubuntu 20.04 LTS and installed the 
>packages fence-virt and fence-virtd.
>
>The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just 
>returns the single local VM.
>
>The same command on both VMs results in:
># fence_xvm -a 225.0.0.12 -o list
>Timed out waiting for response
>Operation failed
>
>But just as before, trying to connect from the guest to the host via nc
>
>just works fine.
>#nc -z -v -u 192.168.1.21 1229
>Connection to 192.168.1.21 1229 port [udp/*] succeeded!
>
>So the hosts and service basically is reachable.
>
>I have spoken to our Firewall tech, he has assured me, that no local 
>traffic is hindered by anything. Be it multicast or not.
>Software Firewalls are not present/active on any of our servers.
>
>Ubuntu guests:
># ufw status
>Status: inactive
>
>CentOS hosts:
>systemctl status firewalld
>● firewalld.service - firewalld - dynamic firewall daemon
>  Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; 
>vendor preset: enabled)
>    Active: inactive (dead)
>      Docs: man:firewalld(1)
>
>
>Any hints or help on how to remedy this problem would be greatly 
>appreciated!
>
>Kind regards
>Stefan Schmitz
>
>
>Am 07.07.2020 um 10:54 schrieb Klaus Wenninger:
>> On 7/7/20 10:33 AM, Strahil Nikolov wrote:
>>> I can't find fence_virtd for Ubuntu18, but it is available for
>Ubuntu20.
>>>
>>> Your other option is to get an iSCSI from your quorum system and use
>that for SBD.
>>> For watchdog, you can use 'softdog' kernel module or you can use KVM
>to present one to the VMs.
>>> You can also check the '-P' flag for SBD.
>> With kvm please use the qemu-watchdog and try to
>> prevent using softdogwith SBD.
>> Especially if you are aiming for a production-cluster ...
>> 
>> Adding something like that to libvirt-xml should do the trick:
>> <watchdog model='i6300esb' action='reset'>
>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
>> function='0x0'/>
>> </watchdog>
>> 
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>> На 7 юли 2020 г. 10:11:38 GMT+03:00,
>"stefan.schmitz at farmpartner-tec.com"
><stefan.schmitz at farmpartner-tec.com> написа:
>>>>> What does 'virsh list'
>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>> the VMs ...
>>>> Yes, each host shows its own
>>>>
>>>> # virsh list
>>>>   Id    Name                           Status
>>>> ----------------------------------------------------
>>>>   2     kvm101                         running
>>>>
>>>> # virsh list
>>>>   Id    Name                           State
>>>> ----------------------------------------------------
>>>>   1     kvm102                         running
>>>>
>>>>
>>>>
>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>> guests as well?
>>>> fence_xvm sadly does not work on the Ubuntu guests. The howto said
>to
>>>> install  "yum install fence-virt fence-virtd" which do not exist as
>>>> such
>>>> in Ubuntu 18.04. After we tried to find the appropiate packages we
>>>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>>>> something misisng or completely wrong?
>>>> Though we can  connect to both hosts using "nc -z -v -u
>192.168.1.21
>>>> 1229", that just works fine.
>>>>
>> without fence-virt you can't expect the whole thing to work.
>> maybe you can build it for your ubuntu-version from sources of
>> a package for another ubuntu-version if it doesn't exist yet.
>> btw. which pacemaker-version are you using?
>> There was a convenience-fix on the master-branch for at least
>> a couple of days (sometimes during 2.0.4 release-cycle) that
>> wasn't compatible with fence_xvm.
>>>>> Usually,  the biggest problem is the multicast traffic - as in
>many
>>>>> environments it can be dropped  by firewalls.
>>>> To make sure I have requested our Datacenter techs to verify that
>>>> multicast Traffic can move unhindered in our local Network. But in
>the
>>>> past on multiple occasions they have confirmed, that local traffic
>is
>>>> not filtered in any way. But Since now I have never specifically
>asked
>>>> for multicast traffic, which I now did. I am waiting for an answer
>to
>>>> that question.
>>>>
>>>>
>>>> kind regards
>>>> Stefan Schmitz
>>>>
>>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>> Hello,
>>>>>>
>>>>>>>> # fence_xvm -o list
>>>>>>>> kvm102
>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>> on
>>>>>>> This should show both VMs, so getting to that point will likely
>>>> solve
>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>> obscure network configuration to get that working on the VMs.
>>>>> You said you tried on both hosts. What does 'virsh list'
>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>> the VMs ...
>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>> guests as well?
>>>>> Did you try pinging via the physical network that is
>>>>> connected tothe bridge configured to be used for
>>>>> fencing?
>>>>> If I got it right fence_xvm should supportcollecting
>>>>> answersfrom multiple hosts but I found a suggestion
>>>>> to do a setup with 2 multicast-addresses & keys for
>>>>> each host.
>>>>> Which route did you go?
>>>>>
>>>>> Klaus
>>>>>> Thank you for pointing me in that direction. We have tried to
>solve
>>>>>> that but with no success. We were using an howto provided here
>>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>>>
>>>>>> Problem is, it specifically states that the tutorial does not yet
>>>>>> support the case where guests are running on multiple hosts.
>There
>>>> are
>>>>>> some short hints what might be necessary to do, but working
>through
>>>>>> those sadly just did not work nor where there any clues which
>would
>>>>>> help us finding a solution ourselves. So now we are completely
>stuck
>>>>>> here.
>>>>>>
>>>>>> Has someone the same configuration with Guest VMs on multiple
>hosts?
>>>>>> And how did you manage to get that to work? What do we need to do
>to
>>>>>> resolve this? Is there maybe even someone who would be willing to
>>>> take
>>>>>> a closer look at our server? Any help would be greatly
>appreciated!
>>>>>>
>>>>>> Kind regards
>>>>>> Stefan Schmitz
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>>>> stefan.schmitz at farmpartner-tec.com
>>>>>>> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I hope someone can help with this problem. We are (still)
>trying
>>>> to
>>>>>>>> get
>>>>>>>> Stonith to achieve a running active/active HA Cluster, but
>sadly
>>>> to
>>>>>>>> no
>>>>>>>> avail.
>>>>>>>>
>>>>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>>>> VM.
>>>>>>>> The
>>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>>>
>>>>>>>> The current status is this:
>>>>>>>>
>>>>>>>> # pcs status
>>>>>>>> Cluster name: pacemaker_cluster
>>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs
>used
>>>> in
>>>>>>>> setup?)
>>>>>>>> Stack: corosync
>>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) -
>partition
>>>>>>>> with
>>>>>>>> quorum
>>>>>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>>>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>>>>>> server4ubuntu1
>>>>>>>>
>>>>>>>> 2 nodes configured
>>>>>>>> 13 resources configured
>>>>>>>>
>>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>
>>>>>>>> Full list of resources:
>>>>>>>>
>>>>>>>>      stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>>>>>      Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>>>>          Masters: [ server4ubuntu1 ]
>>>>>>>>          Slaves: [ server2ubuntu1 ]
>>>>>>>>      Master/Slave Set: WebDataClone [WebData]
>>>>>>>>          Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>      Clone Set: dlm-clone [dlm]
>>>>>>>>          Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>      Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>>>>          ClusterIP:0        (ocf::heartbeat:IPaddr2):      
>Started
>>>>>>>> server2ubuntu1
>>>>>>>>          ClusterIP:1        (ocf::heartbeat:IPaddr2):      
>Started
>>>>>>>> server4ubuntu1
>>>>>>>>      Clone Set: WebFS-clone [WebFS]
>>>>>>>>          Started: [ server4ubuntu1 ]
>>>>>>>>          Stopped: [ server2ubuntu1 ]
>>>>>>>>      Clone Set: WebSite-clone [WebSite]
>>>>>>>>          Started: [ server4ubuntu1 ]
>>>>>>>>          Stopped: [ server2ubuntu1 ]
>>>>>>>>
>>>>>>>> Failed Actions:
>>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>>>> call=201,
>>>>>>>> status=Error, exitreason='',
>>>>>>>>         last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>>>>>>> exec=3403ms
>>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>>>> call=203,
>>>>>>>> status=complete, exitreason='',
>>>>>>>>         last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms,
>>>> exec=0ms
>>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>>>> call=202,
>>>>>>>> status=Error, exitreason='',
>>>>>>>>         last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>>>>>>> exec=3411ms
>>>>>>>>
>>>>>>>>
>>>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>>>> On both hosts the command
>>>>>>>> # fence_xvm -o list
>>>>>>>> kvm102
>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>> on
>>>>>>> This should show both VMs, so getting to that point will likely
>>>> solve
>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>>
>>>>>>>> returns the local VM. Apparently it connects through the
>>>>>>>> Virtualization
>>>>>>>> interface because it returns the VM name not the Hostname of
>the
>>>>>>>> client
>>>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>>>
>>>>>>> To get pacemaker to be able to use it for fencing the cluster
>>>> nodes,
>>>>>>> you have to add a pcmk_host_map parameter to the fencing
>resource.
>>>> It
>>>>>>> looks like
>pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>>>
>>>>>>>> In the local network, every traffic is allowed. No firewall is
>>>>>>>> locally
>>>>>>>> active, just the connections leaving the local network are
>>>>>>>> firewalled.
>>>>>>>> Hence there are no coneection problems between the hosts and
>>>> clients.
>>>>>>>> For example we can succesfully connect from the clients to the
>>>> Hosts:
>>>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>
>>>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>
>>>>>>>>
>>>>>>>> On the Ubuntu VMs we created and configured the the stonith
>>>> resource
>>>>>>>> according to the  howto provided here:
>>>>>>>>
>>>>
>https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>>>>>
>>>>>>>> The actual line we used:
>>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1
>external/libvirt
>>>>>>>> hostlist="Host4,host2"
>>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>>>
>>>>>>>>
>>>>>>>> But as you can see in in the pcs status output, stonith is
>stopped
>>>>>>>> and
>>>>>>>> exits with an unkown error.
>>>>>>>>
>>>>>>>> Can somebody please advise on how to procced or what additionla
>>>>>>>> information is needed to solve this problem?
>>>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>> Stefan Schmitz
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Manage your subscription:
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>
>> 


More information about the Users mailing list