[ClusterLabs] Still Beginner STONITH Problem
Klaus Wenninger
kwenning at redhat.com
Thu Jul 9 10:30:11 EDT 2020
On 7/9/20 4:01 PM, stefan.schmitz at farmpartner-tec.com wrote:
> Hello,
>
> thanks for the advise. I have worked through that list as follows:
>
> > - key deployed on the Hypervisours
> > - key deployed on the VMs
> I created the key file a while ago once on one host and distributed it
> to every other host and guest. Right now it resides on all 4 machines
> in the same path: /etc/cluster/fence_xvm.key
> Is there maybe a a corosync/Stonith or other function which checks the
> keyfiles for any corruption or errors?
>
>
> > - fence_virtd running on both Hypervisours
> It is running on each host:
> # ps aux |grep fence_virtd
> root 62032 0.0 0.0 251568 4496 ? Ss Jun29 0:00
> fence_virtd
>
>
> > - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests)
>
> Command on one host:
> fence_xvm -a 225.0.0.12 -o list
>
> tcpdump on the guest residing on the other host:
> host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176
> host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
> 225.0.0.12 to_in { }]
> host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
> 225.0.0.12 to_in { }]
>
> At least to me it looks like the VMs are reachable by the multicast
> traffic.
> Additionally, no matter on which host I execute the fence_xvm command,
> tcpdum shows the same traffic on both guests.
> But on the other hand, at the same time, tcpdump shows nothing on the
> other host. Just to be sure I have flushed iptables beforehand on each
> host. Is there maybe a problem?
Well, theory still holds I would say.
I guess that the multicast-traffic from the other host
or the guestsdoesn't get to the daemon on the host.
Can't you just simply check if there are any firewall
rules configuredon the host kernel?
>
>
> > - fence_xvm on both VMs
> fence_xvm is installed on both VMs
> # which fence_xvm
> /usr/sbin/fence_xvm
>
> Could you please advise on how to proceed? Thank you in advance.
> Kind regards
> Stefan Schmitz
>
> Am 08.07.2020 um 20:24 schrieb Strahil Nikolov:
>> Erm...network/firewall is always "green". Run tcpdump on Host1 and
>> VM2 (not on the same host).
>> Then run again 'fence_xvm -o list' and check what is captured.
>>
>> In summary, you need:
>> - key deployed on the Hypervisours
>> - key deployed on the VMs
>> - fence_virtd running on both Hypervisours
>> - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests)
>> - fence_xvm on both VMs
>>
>> In your case , the primary suspect is multicast traffic.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 8 юли 2020 г. 16:33:45 GMT+03:00,
>> "stefan.schmitz at farmpartner-tec.com"
>> <stefan.schmitz at farmpartner-tec.com> написа:
>>> Hello,
>>>
>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>>> Ubuntu20.
>>>
>>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed the
>>> packages fence-virt and fence-virtd.
>>>
>>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just
>>> returns the single local VM.
>>>
>>> The same command on both VMs results in:
>>> # fence_xvm -a 225.0.0.12 -o list
>>> Timed out waiting for response
>>> Operation failed
>>>
>>> But just as before, trying to connect from the guest to the host via nc
>>>
>>> just works fine.
>>> #nc -z -v -u 192.168.1.21 1229
>>> Connection to 192.168.1.21 1229 port [udp/*] succeeded!
>>>
>>> So the hosts and service basically is reachable.
>>>
>>> I have spoken to our Firewall tech, he has assured me, that no local
>>> traffic is hindered by anything. Be it multicast or not.
>>> Software Firewalls are not present/active on any of our servers.
>>>
>>> Ubuntu guests:
>>> # ufw status
>>> Status: inactive
>>>
>>> CentOS hosts:
>>> systemctl status firewalld
>>> ● firewalld.service - firewalld - dynamic firewall daemon
>>> Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled;
>>> vendor preset: enabled)
>>> Active: inactive (dead)
>>> Docs: man:firewalld(1)
>>>
>>>
>>> Any hints or help on how to remedy this problem would be greatly
>>> appreciated!
>>>
>>> Kind regards
>>> Stefan Schmitz
>>>
>>>
>>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger:
>>>> On 7/7/20 10:33 AM, Strahil Nikolov wrote:
>>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>> Ubuntu20.
>>>>>
>>>>> Your other option is to get an iSCSI from your quorum system and use
>>> that for SBD.
>>>>> For watchdog, you can use 'softdog' kernel module or you can use KVM
>>> to present one to the VMs.
>>>>> You can also check the '-P' flag for SBD.
>>>> With kvm please use the qemu-watchdog and try to
>>>> prevent using softdogwith SBD.
>>>> Especially if you are aiming for a production-cluster ...
>>>>
>>>> Adding something like that to libvirt-xml should do the trick:
>>>> <watchdog model='i6300esb' action='reset'>
>>>> <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
>>>> function='0x0'/>
>>>> </watchdog>
>>>>
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>
>>>>> На 7 юли 2020 г. 10:11:38 GMT+03:00,
>>> "stefan.schmitz at farmpartner-tec.com"
>>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>>>>> What does 'virsh list'
>>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>>> the VMs ...
>>>>>> Yes, each host shows its own
>>>>>>
>>>>>> # virsh list
>>>>>> Id Name Status
>>>>>> ----------------------------------------------------
>>>>>> 2 kvm101 running
>>>>>>
>>>>>> # virsh list
>>>>>> Id Name State
>>>>>> ----------------------------------------------------
>>>>>> 1 kvm102 running
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>>> guests as well?
>>>>>> fence_xvm sadly does not work on the Ubuntu guests. The howto said
>>> to
>>>>>> install "yum install fence-virt fence-virtd" which do not exist as
>>>>>> such
>>>>>> in Ubuntu 18.04. After we tried to find the appropiate packages we
>>>>>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>>>>>> something misisng or completely wrong?
>>>>>> Though we can connect to both hosts using "nc -z -v -u
>>> 192.168.1.21
>>>>>> 1229", that just works fine.
>>>>>>
>>>> without fence-virt you can't expect the whole thing to work.
>>>> maybe you can build it for your ubuntu-version from sources of
>>>> a package for another ubuntu-version if it doesn't exist yet.
>>>> btw. which pacemaker-version are you using?
>>>> There was a convenience-fix on the master-branch for at least
>>>> a couple of days (sometimes during 2.0.4 release-cycle) that
>>>> wasn't compatible with fence_xvm.
>>>>>>> Usually, the biggest problem is the multicast traffic - as in
>>> many
>>>>>>> environments it can be dropped by firewalls.
>>>>>> To make sure I have requested our Datacenter techs to verify that
>>>>>> multicast Traffic can move unhindered in our local Network. But in
>>> the
>>>>>> past on multiple occasions they have confirmed, that local traffic
>>> is
>>>>>> not filtered in any way. But Since now I have never specifically
>>> asked
>>>>>> for multicast traffic, which I now did. I am waiting for an answer
>>> to
>>>>>> that question.
>>>>>>
>>>>>>
>>>>>> kind regards
>>>>>> Stefan Schmitz
>>>>>>
>>>>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>>>>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>>>> # fence_xvm -o list
>>>>>>>>>> kvm102
>>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>>> on
>>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>>> solve
>>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>> You said you tried on both hosts. What does 'virsh list'
>>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>>> the VMs ...
>>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>>> guests as well?
>>>>>>> Did you try pinging via the physical network that is
>>>>>>> connected tothe bridge configured to be used for
>>>>>>> fencing?
>>>>>>> If I got it right fence_xvm should supportcollecting
>>>>>>> answersfrom multiple hosts but I found a suggestion
>>>>>>> to do a setup with 2 multicast-addresses & keys for
>>>>>>> each host.
>>>>>>> Which route did you go?
>>>>>>>
>>>>>>> Klaus
>>>>>>>> Thank you for pointing me in that direction. We have tried to
>>> solve
>>>>>>>> that but with no success. We were using an howto provided here
>>>>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>>>>>
>>>>>>>> Problem is, it specifically states that the tutorial does not yet
>>>>>>>> support the case where guests are running on multiple hosts.
>>> There
>>>>>> are
>>>>>>>> some short hints what might be necessary to do, but working
>>> through
>>>>>>>> those sadly just did not work nor where there any clues which
>>> would
>>>>>>>> help us finding a solution ourselves. So now we are completely
>>> stuck
>>>>>>>> here.
>>>>>>>>
>>>>>>>> Has someone the same configuration with Guest VMs on multiple
>>> hosts?
>>>>>>>> And how did you manage to get that to work? What do we need to do
>>> to
>>>>>>>> resolve this? Is there maybe even someone who would be willing to
>>>>>> take
>>>>>>>> a closer look at our server? Any help would be greatly
>>> appreciated!
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>> Stefan Schmitz
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>>>>>> stefan.schmitz at farmpartner-tec.com
>>>>>>>>> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I hope someone can help with this problem. We are (still)
>>> trying
>>>>>> to
>>>>>>>>>> get
>>>>>>>>>> Stonith to achieve a running active/active HA Cluster, but
>>> sadly
>>>>>> to
>>>>>>>>>> no
>>>>>>>>>> avail.
>>>>>>>>>>
>>>>>>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>>>>>> VM.
>>>>>>>>>> The
>>>>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>>>>>
>>>>>>>>>> The current status is this:
>>>>>>>>>>
>>>>>>>>>> # pcs status
>>>>>>>>>> Cluster name: pacemaker_cluster
>>>>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs
>>> used
>>>>>> in
>>>>>>>>>> setup?)
>>>>>>>>>> Stack: corosync
>>>>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) -
>>> partition
>>>>>>>>>> with
>>>>>>>>>> quorum
>>>>>>>>>> Last updated: Thu Jul 2 17:03:53 2020
>>>>>>>>>> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on
>>>>>>>>>> server4ubuntu1
>>>>>>>>>>
>>>>>>>>>> 2 nodes configured
>>>>>>>>>> 13 resources configured
>>>>>>>>>>
>>>>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>>
>>>>>>>>>> Full list of resources:
>>>>>>>>>>
>>>>>>>>>> stonith_id_1 (stonith:external/libvirt): Stopped
>>>>>>>>>> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>>>>>> Masters: [ server4ubuntu1 ]
>>>>>>>>>> Slaves: [ server2ubuntu1 ]
>>>>>>>>>> Master/Slave Set: WebDataClone [WebData]
>>>>>>>>>> Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>> Clone Set: dlm-clone [dlm]
>>>>>>>>>> Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>>>>>> ClusterIP:0 (ocf::heartbeat:IPaddr2):
>>> Started
>>>>>>>>>> server2ubuntu1
>>>>>>>>>> ClusterIP:1 (ocf::heartbeat:IPaddr2):
>>> Started
>>>>>>>>>> server4ubuntu1
>>>>>>>>>> Clone Set: WebFS-clone [WebFS]
>>>>>>>>>> Started: [ server4ubuntu1 ]
>>>>>>>>>> Stopped: [ server2ubuntu1 ]
>>>>>>>>>> Clone Set: WebSite-clone [WebSite]
>>>>>>>>>> Started: [ server4ubuntu1 ]
>>>>>>>>>> Stopped: [ server2ubuntu1 ]
>>>>>>>>>>
>>>>>>>>>> Failed Actions:
>>>>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>>>>>> call=201,
>>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms,
>>>>>>>>>> exec=3403ms
>>>>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>>>>>> call=203,
>>>>>>>>>> status=complete, exitreason='',
>>>>>>>>>> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms,
>>>>>> exec=0ms
>>>>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>>>>>> call=202,
>>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms,
>>>>>>>>>> exec=3411ms
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>>>>>> On both hosts the command
>>>>>>>>>> # fence_xvm -o list
>>>>>>>>>> kvm102
>>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>>> on
>>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>>> solve
>>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>>>>
>>>>>>>>>> returns the local VM. Apparently it connects through the
>>>>>>>>>> Virtualization
>>>>>>>>>> interface because it returns the VM name not the Hostname of
>>> the
>>>>>>>>>> client
>>>>>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>>>>>
>>>>>>>>> To get pacemaker to be able to use it for fencing the cluster
>>>>>> nodes,
>>>>>>>>> you have to add a pcmk_host_map parameter to the fencing
>>> resource.
>>>>>> It
>>>>>>>>> looks like
>>> pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>>>>>
>>>>>>>>>> In the local network, every traffic is allowed. No firewall is
>>>>>>>>>> locally
>>>>>>>>>> active, just the connections leaving the local network are
>>>>>>>>>> firewalled.
>>>>>>>>>> Hence there are no coneection problems between the hosts and
>>>>>> clients.
>>>>>>>>>> For example we can succesfully connect from the clients to the
>>>>>> Hosts:
>>>>>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>>
>>>>>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On the Ubuntu VMs we created and configured the the stonith
>>>>>> resource
>>>>>>>>>> according to the howto provided here:
>>>>>>>>>>
>>>>>>
>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>
>>>>>>>>>>
>>>>>>>>>> The actual line we used:
>>>>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1
>>> external/libvirt
>>>>>>>>>> hostlist="Host4,host2"
>>>>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But as you can see in in the pcs status output, stonith is
>>> stopped
>>>>>>>>>> and
>>>>>>>>>> exits with an unkown error.
>>>>>>>>>>
>>>>>>>>>> Can somebody please advise on how to procced or what additionla
>>>>>>>>>> information is needed to solve this problem?
>>>>>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>>>>>
>>>>>>>>>> Kind regards
>>>>>>>>>> Stefan Schmitz
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Manage your subscription:
>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>>>
>>>>
>
More information about the Users
mailing list