[ClusterLabs] Still Beginner STONITH Problem
stefan.schmitz at farmpartner-tec.com
stefan.schmitz at farmpartner-tec.com
Thu Jul 9 10:01:13 EDT 2020
Hello,
thanks for the advise. I have worked through that list as follows:
> - key deployed on the Hypervisours
> - key deployed on the VMs
I created the key file a while ago once on one host and distributed it
to every other host and guest. Right now it resides on all 4 machines in
the same path: /etc/cluster/fence_xvm.key
Is there maybe a a corosync/Stonith or other function which checks the
keyfiles for any corruption or errors?
> - fence_virtd running on both Hypervisours
It is running on each host:
# ps aux |grep fence_virtd
root 62032 0.0 0.0 251568 4496 ? Ss Jun29 0:00
fence_virtd
> - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests)
Command on one host:
fence_xvm -a 225.0.0.12 -o list
tcpdump on the guest residing on the other host:
host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176
host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
225.0.0.12 to_in { }]
host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
225.0.0.12 to_in { }]
At least to me it looks like the VMs are reachable by the multicast traffic.
Additionally, no matter on which host I execute the fence_xvm command,
tcpdum shows the same traffic on both guests.
But on the other hand, at the same time, tcpdump shows nothing on the
other host. Just to be sure I have flushed iptables beforehand on each
host. Is there maybe a problem?
> - fence_xvm on both VMs
fence_xvm is installed on both VMs
# which fence_xvm
/usr/sbin/fence_xvm
Could you please advise on how to proceed? Thank you in advance.
Kind regards
Stefan Schmitz
Am 08.07.2020 um 20:24 schrieb Strahil Nikolov:
> Erm...network/firewall is always "green". Run tcpdump on Host1 and VM2 (not on the same host).
> Then run again 'fence_xvm -o list' and check what is captured.
>
> In summary, you need:
> - key deployed on the Hypervisours
> - key deployed on the VMs
> - fence_virtd running on both Hypervisours
> - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests)
> - fence_xvm on both VMs
>
> In your case , the primary suspect is multicast traffic.
>
> Best Regards,
> Strahil Nikolov
>
> На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schmitz at farmpartner-tec.com" <stefan.schmitz at farmpartner-tec.com> написа:
>> Hello,
>>
>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>> Ubuntu20.
>>
>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed the
>> packages fence-virt and fence-virtd.
>>
>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just
>> returns the single local VM.
>>
>> The same command on both VMs results in:
>> # fence_xvm -a 225.0.0.12 -o list
>> Timed out waiting for response
>> Operation failed
>>
>> But just as before, trying to connect from the guest to the host via nc
>>
>> just works fine.
>> #nc -z -v -u 192.168.1.21 1229
>> Connection to 192.168.1.21 1229 port [udp/*] succeeded!
>>
>> So the hosts and service basically is reachable.
>>
>> I have spoken to our Firewall tech, he has assured me, that no local
>> traffic is hindered by anything. Be it multicast or not.
>> Software Firewalls are not present/active on any of our servers.
>>
>> Ubuntu guests:
>> # ufw status
>> Status: inactive
>>
>> CentOS hosts:
>> systemctl status firewalld
>> ● firewalld.service - firewalld - dynamic firewall daemon
>> Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled;
>> vendor preset: enabled)
>> Active: inactive (dead)
>> Docs: man:firewalld(1)
>>
>>
>> Any hints or help on how to remedy this problem would be greatly
>> appreciated!
>>
>> Kind regards
>> Stefan Schmitz
>>
>>
>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger:
>>> On 7/7/20 10:33 AM, Strahil Nikolov wrote:
>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>> Ubuntu20.
>>>>
>>>> Your other option is to get an iSCSI from your quorum system and use
>> that for SBD.
>>>> For watchdog, you can use 'softdog' kernel module or you can use KVM
>> to present one to the VMs.
>>>> You can also check the '-P' flag for SBD.
>>> With kvm please use the qemu-watchdog and try to
>>> prevent using softdogwith SBD.
>>> Especially if you are aiming for a production-cluster ...
>>>
>>> Adding something like that to libvirt-xml should do the trick:
>>> <watchdog model='i6300esb' action='reset'>
>>> <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
>>> function='0x0'/>
>>> </watchdog>
>>>
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>>> На 7 юли 2020 г. 10:11:38 GMT+03:00,
>> "stefan.schmitz at farmpartner-tec.com"
>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>>>> What does 'virsh list'
>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>> the VMs ...
>>>>> Yes, each host shows its own
>>>>>
>>>>> # virsh list
>>>>> Id Name Status
>>>>> ----------------------------------------------------
>>>>> 2 kvm101 running
>>>>>
>>>>> # virsh list
>>>>> Id Name State
>>>>> ----------------------------------------------------
>>>>> 1 kvm102 running
>>>>>
>>>>>
>>>>>
>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>> guests as well?
>>>>> fence_xvm sadly does not work on the Ubuntu guests. The howto said
>> to
>>>>> install "yum install fence-virt fence-virtd" which do not exist as
>>>>> such
>>>>> in Ubuntu 18.04. After we tried to find the appropiate packages we
>>>>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>>>>> something misisng or completely wrong?
>>>>> Though we can connect to both hosts using "nc -z -v -u
>> 192.168.1.21
>>>>> 1229", that just works fine.
>>>>>
>>> without fence-virt you can't expect the whole thing to work.
>>> maybe you can build it for your ubuntu-version from sources of
>>> a package for another ubuntu-version if it doesn't exist yet.
>>> btw. which pacemaker-version are you using?
>>> There was a convenience-fix on the master-branch for at least
>>> a couple of days (sometimes during 2.0.4 release-cycle) that
>>> wasn't compatible with fence_xvm.
>>>>>> Usually, the biggest problem is the multicast traffic - as in
>> many
>>>>>> environments it can be dropped by firewalls.
>>>>> To make sure I have requested our Datacenter techs to verify that
>>>>> multicast Traffic can move unhindered in our local Network. But in
>> the
>>>>> past on multiple occasions they have confirmed, that local traffic
>> is
>>>>> not filtered in any way. But Since now I have never specifically
>> asked
>>>>> for multicast traffic, which I now did. I am waiting for an answer
>> to
>>>>> that question.
>>>>>
>>>>>
>>>>> kind regards
>>>>> Stefan Schmitz
>>>>>
>>>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>>>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>>>> # fence_xvm -o list
>>>>>>>>> kvm102
>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>> on
>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>> solve
>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>> You said you tried on both hosts. What does 'virsh list'
>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>> the VMs ...
>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>> guests as well?
>>>>>> Did you try pinging via the physical network that is
>>>>>> connected tothe bridge configured to be used for
>>>>>> fencing?
>>>>>> If I got it right fence_xvm should supportcollecting
>>>>>> answersfrom multiple hosts but I found a suggestion
>>>>>> to do a setup with 2 multicast-addresses & keys for
>>>>>> each host.
>>>>>> Which route did you go?
>>>>>>
>>>>>> Klaus
>>>>>>> Thank you for pointing me in that direction. We have tried to
>> solve
>>>>>>> that but with no success. We were using an howto provided here
>>>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>>>>
>>>>>>> Problem is, it specifically states that the tutorial does not yet
>>>>>>> support the case where guests are running on multiple hosts.
>> There
>>>>> are
>>>>>>> some short hints what might be necessary to do, but working
>> through
>>>>>>> those sadly just did not work nor where there any clues which
>> would
>>>>>>> help us finding a solution ourselves. So now we are completely
>> stuck
>>>>>>> here.
>>>>>>>
>>>>>>> Has someone the same configuration with Guest VMs on multiple
>> hosts?
>>>>>>> And how did you manage to get that to work? What do we need to do
>> to
>>>>>>> resolve this? Is there maybe even someone who would be willing to
>>>>> take
>>>>>>> a closer look at our server? Any help would be greatly
>> appreciated!
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Stefan Schmitz
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>>>>> stefan.schmitz at farmpartner-tec.com
>>>>>>>> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I hope someone can help with this problem. We are (still)
>> trying
>>>>> to
>>>>>>>>> get
>>>>>>>>> Stonith to achieve a running active/active HA Cluster, but
>> sadly
>>>>> to
>>>>>>>>> no
>>>>>>>>> avail.
>>>>>>>>>
>>>>>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>>>>> VM.
>>>>>>>>> The
>>>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>>>>
>>>>>>>>> The current status is this:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: pacemaker_cluster
>>>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs
>> used
>>>>> in
>>>>>>>>> setup?)
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) -
>> partition
>>>>>>>>> with
>>>>>>>>> quorum
>>>>>>>>> Last updated: Thu Jul 2 17:03:53 2020
>>>>>>>>> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on
>>>>>>>>> server4ubuntu1
>>>>>>>>>
>>>>>>>>> 2 nodes configured
>>>>>>>>> 13 resources configured
>>>>>>>>>
>>>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> stonith_id_1 (stonith:external/libvirt): Stopped
>>>>>>>>> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>>>>> Masters: [ server4ubuntu1 ]
>>>>>>>>> Slaves: [ server2ubuntu1 ]
>>>>>>>>> Master/Slave Set: WebDataClone [WebData]
>>>>>>>>> Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>> Clone Set: dlm-clone [dlm]
>>>>>>>>> Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>>>>> ClusterIP:0 (ocf::heartbeat:IPaddr2):
>> Started
>>>>>>>>> server2ubuntu1
>>>>>>>>> ClusterIP:1 (ocf::heartbeat:IPaddr2):
>> Started
>>>>>>>>> server4ubuntu1
>>>>>>>>> Clone Set: WebFS-clone [WebFS]
>>>>>>>>> Started: [ server4ubuntu1 ]
>>>>>>>>> Stopped: [ server2ubuntu1 ]
>>>>>>>>> Clone Set: WebSite-clone [WebSite]
>>>>>>>>> Started: [ server4ubuntu1 ]
>>>>>>>>> Stopped: [ server2ubuntu1 ]
>>>>>>>>>
>>>>>>>>> Failed Actions:
>>>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>>>>> call=201,
>>>>>>>>> status=Error, exitreason='',
>>>>>>>>> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms,
>>>>>>>>> exec=3403ms
>>>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>>>>> call=203,
>>>>>>>>> status=complete, exitreason='',
>>>>>>>>> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms,
>>>>> exec=0ms
>>>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>>>>> call=202,
>>>>>>>>> status=Error, exitreason='',
>>>>>>>>> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms,
>>>>>>>>> exec=3411ms
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>>>>> On both hosts the command
>>>>>>>>> # fence_xvm -o list
>>>>>>>>> kvm102
>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>> on
>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>> solve
>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>>>
>>>>>>>>> returns the local VM. Apparently it connects through the
>>>>>>>>> Virtualization
>>>>>>>>> interface because it returns the VM name not the Hostname of
>> the
>>>>>>>>> client
>>>>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>>>>
>>>>>>>> To get pacemaker to be able to use it for fencing the cluster
>>>>> nodes,
>>>>>>>> you have to add a pcmk_host_map parameter to the fencing
>> resource.
>>>>> It
>>>>>>>> looks like
>> pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>>>>
>>>>>>>>> In the local network, every traffic is allowed. No firewall is
>>>>>>>>> locally
>>>>>>>>> active, just the connections leaving the local network are
>>>>>>>>> firewalled.
>>>>>>>>> Hence there are no coneection problems between the hosts and
>>>>> clients.
>>>>>>>>> For example we can succesfully connect from the clients to the
>>>>> Hosts:
>>>>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>
>>>>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On the Ubuntu VMs we created and configured the the stonith
>>>>> resource
>>>>>>>>> according to the howto provided here:
>>>>>>>>>
>>>>>
>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>>>>>>
>>>>>>>>> The actual line we used:
>>>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1
>> external/libvirt
>>>>>>>>> hostlist="Host4,host2"
>>>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But as you can see in in the pcs status output, stonith is
>> stopped
>>>>>>>>> and
>>>>>>>>> exits with an unkown error.
>>>>>>>>>
>>>>>>>>> Can somebody please advise on how to procced or what additionla
>>>>>>>>> information is needed to solve this problem?
>>>>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>> Stefan Schmitz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Manage your subscription:
>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>>
>>>
More information about the Users
mailing list