[ClusterLabs] Still Beginner STONITH Problem

Klaus Wenninger kwenning at redhat.com
Thu Jul 9 01:41:50 EDT 2020


On 7/8/20 8:24 PM, Strahil Nikolov wrote:
> Erm...network/firewall  is  always "green". Run tcpdump on Host1  and VM2  (not  on the same host).
> Then run again 'fence_xvm  -o  list' and check what is captured.
>
> In summary,  you need:
> -  key deployed on the Hypervisours
> -  key deployed on the VMs
> -  fence_virtd  running on both Hypervisours
> -  Firewall opened (1229/udp  for the hosts,  1229/tcp  for the guests)
> -  fence_xvm on both VMs
>
> In your case  ,  the primary suspect  is multicast  traffic.
Or just a simple port-access issue ...
firewalld is not the only way you can
setup some kind of firewall on your local machine.
iptables.service might be active for instance.
I have no personal experience with multiple hosts & fence_xvm.
So when you have solved your primary issue you might still
consider running 2 parallel setups.
I've read about a recommendation
to do so and I have a vague memory about an email-thread
stating some issues.
Anybody here can state that multiple-hosts with a single
multicast-ip is working reliably?
>
> Best  Regards,
> Strahil Nikolov
>
> На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schmitz at farmpartner-tec.com" <stefan.schmitz at farmpartner-tec.com> написа:
>> Hello,
>>
>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>> Ubuntu20.
>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed the 
>> packages fence-virt and fence-virtd.
>>
>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just 
>> returns the single local VM.
>>
>> The same command on both VMs results in:
>> # fence_xvm -a 225.0.0.12 -o list
>> Timed out waiting for response
>> Operation failed
>>
>> But just as before, trying to connect from the guest to the host via nc
>>
>> just works fine.
>> #nc -z -v -u 192.168.1.21 1229
>> Connection to 192.168.1.21 1229 port [udp/*] succeeded!
>>
>> So the hosts and service basically is reachable.
>>
>> I have spoken to our Firewall tech, he has assured me, that no local 
>> traffic is hindered by anything. Be it multicast or not.
>> Software Firewalls are not present/active on any of our servers.
>>
>> Ubuntu guests:
>> # ufw status
>> Status: inactive
>>
>> CentOS hosts:
>> systemctl status firewalld
>> ● firewalld.service - firewalld - dynamic firewall daemon
>>  Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; 
>> vendor preset: enabled)
>>    Active: inactive (dead)
>>      Docs: man:firewalld(1)
>>
>>
>> Any hints or help on how to remedy this problem would be greatly 
>> appreciated!
>>
>> Kind regards
>> Stefan Schmitz
>>
>>
>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger:
>>> On 7/7/20 10:33 AM, Strahil Nikolov wrote:
>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>> Ubuntu20.
>>>> Your other option is to get an iSCSI from your quorum system and use
>> that for SBD.
>>>> For watchdog, you can use 'softdog' kernel module or you can use KVM
>> to present one to the VMs.
>>>> You can also check the '-P' flag for SBD.
>>> With kvm please use the qemu-watchdog and try to
>>> prevent using softdogwith SBD.
>>> Especially if you are aiming for a production-cluster ...
>>>
>>> Adding something like that to libvirt-xml should do the trick:
>>> <watchdog model='i6300esb' action='reset'>
>>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
>>> function='0x0'/>
>>> </watchdog>
>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>>> На 7 юли 2020 г. 10:11:38 GMT+03:00,
>> "stefan.schmitz at farmpartner-tec.com"
>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>>>> What does 'virsh list'
>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>> the VMs ...
>>>>> Yes, each host shows its own
>>>>>
>>>>> # virsh list
>>>>>   Id    Name                           Status
>>>>> ----------------------------------------------------
>>>>>   2     kvm101                         running
>>>>>
>>>>> # virsh list
>>>>>   Id    Name                           State
>>>>> ----------------------------------------------------
>>>>>   1     kvm102                         running
>>>>>
>>>>>
>>>>>
>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>> guests as well?
>>>>> fence_xvm sadly does not work on the Ubuntu guests. The howto said
>> to
>>>>> install  "yum install fence-virt fence-virtd" which do not exist as
>>>>> such
>>>>> in Ubuntu 18.04. After we tried to find the appropiate packages we
>>>>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>>>>> something misisng or completely wrong?
>>>>> Though we can  connect to both hosts using "nc -z -v -u
>> 192.168.1.21
>>>>> 1229", that just works fine.
>>>>>
>>> without fence-virt you can't expect the whole thing to work.
>>> maybe you can build it for your ubuntu-version from sources of
>>> a package for another ubuntu-version if it doesn't exist yet.
>>> btw. which pacemaker-version are you using?
>>> There was a convenience-fix on the master-branch for at least
>>> a couple of days (sometimes during 2.0.4 release-cycle) that
>>> wasn't compatible with fence_xvm.
>>>>>> Usually,  the biggest problem is the multicast traffic - as in
>> many
>>>>>> environments it can be dropped  by firewalls.
>>>>> To make sure I have requested our Datacenter techs to verify that
>>>>> multicast Traffic can move unhindered in our local Network. But in
>> the
>>>>> past on multiple occasions they have confirmed, that local traffic
>> is
>>>>> not filtered in any way. But Since now I have never specifically
>> asked
>>>>> for multicast traffic, which I now did. I am waiting for an answer
>> to
>>>>> that question.
>>>>>
>>>>>
>>>>> kind regards
>>>>> Stefan Schmitz
>>>>>
>>>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>>>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>>>> # fence_xvm -o list
>>>>>>>>> kvm102
>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>> on
>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>> solve
>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>> You said you tried on both hosts. What does 'virsh list'
>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>> the VMs ...
>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>> guests as well?
>>>>>> Did you try pinging via the physical network that is
>>>>>> connected tothe bridge configured to be used for
>>>>>> fencing?
>>>>>> If I got it right fence_xvm should supportcollecting
>>>>>> answersfrom multiple hosts but I found a suggestion
>>>>>> to do a setup with 2 multicast-addresses & keys for
>>>>>> each host.
>>>>>> Which route did you go?
>>>>>>
>>>>>> Klaus
>>>>>>> Thank you for pointing me in that direction. We have tried to
>> solve
>>>>>>> that but with no success. We were using an howto provided here
>>>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>>>>
>>>>>>> Problem is, it specifically states that the tutorial does not yet
>>>>>>> support the case where guests are running on multiple hosts.
>> There
>>>>> are
>>>>>>> some short hints what might be necessary to do, but working
>> through
>>>>>>> those sadly just did not work nor where there any clues which
>> would
>>>>>>> help us finding a solution ourselves. So now we are completely
>> stuck
>>>>>>> here.
>>>>>>>
>>>>>>> Has someone the same configuration with Guest VMs on multiple
>> hosts?
>>>>>>> And how did you manage to get that to work? What do we need to do
>> to
>>>>>>> resolve this? Is there maybe even someone who would be willing to
>>>>> take
>>>>>>> a closer look at our server? Any help would be greatly
>> appreciated!
>>>>>>> Kind regards
>>>>>>> Stefan Schmitz
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>>>>> stefan.schmitz at farmpartner-tec.com
>>>>>>>> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I hope someone can help with this problem. We are (still)
>> trying
>>>>> to
>>>>>>>>> get
>>>>>>>>> Stonith to achieve a running active/active HA Cluster, but
>> sadly
>>>>> to
>>>>>>>>> no
>>>>>>>>> avail.
>>>>>>>>>
>>>>>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>>>>> VM.
>>>>>>>>> The
>>>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>>>>
>>>>>>>>> The current status is this:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: pacemaker_cluster
>>>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs
>> used
>>>>> in
>>>>>>>>> setup?)
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) -
>> partition
>>>>>>>>> with
>>>>>>>>> quorum
>>>>>>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>>>>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>>>>>>> server4ubuntu1
>>>>>>>>>
>>>>>>>>> 2 nodes configured
>>>>>>>>> 13 resources configured
>>>>>>>>>
>>>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>>      stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>>>>>>      Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>>>>>          Masters: [ server4ubuntu1 ]
>>>>>>>>>          Slaves: [ server2ubuntu1 ]
>>>>>>>>>      Master/Slave Set: WebDataClone [WebData]
>>>>>>>>>          Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>      Clone Set: dlm-clone [dlm]
>>>>>>>>>          Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>      Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>>>>>          ClusterIP:0        (ocf::heartbeat:IPaddr2):      
>> Started
>>>>>>>>> server2ubuntu1
>>>>>>>>>          ClusterIP:1        (ocf::heartbeat:IPaddr2):      
>> Started
>>>>>>>>> server4ubuntu1
>>>>>>>>>      Clone Set: WebFS-clone [WebFS]
>>>>>>>>>          Started: [ server4ubuntu1 ]
>>>>>>>>>          Stopped: [ server2ubuntu1 ]
>>>>>>>>>      Clone Set: WebSite-clone [WebSite]
>>>>>>>>>          Started: [ server4ubuntu1 ]
>>>>>>>>>          Stopped: [ server2ubuntu1 ]
>>>>>>>>>
>>>>>>>>> Failed Actions:
>>>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>>>>> call=201,
>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>         last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>>>>>>>> exec=3403ms
>>>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>>>>> call=203,
>>>>>>>>> status=complete, exitreason='',
>>>>>>>>>         last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms,
>>>>> exec=0ms
>>>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>>>>> call=202,
>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>         last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>>>>>>>> exec=3411ms
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>>>>> On both hosts the command
>>>>>>>>> # fence_xvm -o list
>>>>>>>>> kvm102
>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>> on
>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>> solve
>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>>>
>>>>>>>>> returns the local VM. Apparently it connects through the
>>>>>>>>> Virtualization
>>>>>>>>> interface because it returns the VM name not the Hostname of
>> the
>>>>>>>>> client
>>>>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>>>>
>>>>>>>> To get pacemaker to be able to use it for fencing the cluster
>>>>> nodes,
>>>>>>>> you have to add a pcmk_host_map parameter to the fencing
>> resource.
>>>>> It
>>>>>>>> looks like
>> pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>>>>> In the local network, every traffic is allowed. No firewall is
>>>>>>>>> locally
>>>>>>>>> active, just the connections leaving the local network are
>>>>>>>>> firewalled.
>>>>>>>>> Hence there are no coneection problems between the hosts and
>>>>> clients.
>>>>>>>>> For example we can succesfully connect from the clients to the
>>>>> Hosts:
>>>>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>
>>>>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On the Ubuntu VMs we created and configured the the stonith
>>>>> resource
>>>>>>>>> according to the  howto provided here:
>>>>>>>>>
>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>>>>>> The actual line we used:
>>>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1
>> external/libvirt
>>>>>>>>> hostlist="Host4,host2"
>>>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But as you can see in in the pcs status output, stonith is
>> stopped
>>>>>>>>> and
>>>>>>>>> exits with an unkown error.
>>>>>>>>>
>>>>>>>>> Can somebody please advise on how to procced or what additionla
>>>>>>>>> information is needed to solve this problem?
>>>>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>> Stefan Schmitz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Manage your subscription:
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>>



More information about the Users mailing list