[ClusterLabs] Still Beginner STONITH Problem

Strahil Nikolov hunter86_bg at yahoo.com
Thu Jul 9 13:10:19 EDT 2020


Have  you  run 'fence_virtd  -c' ?
I made a  silly mistake last time when  I deployed it and the daemon was not listening on the right interface.
Netstat can check this out.
Also, As far as I know  hosts use  unicast to reply to the VMs (thus tcp/1229 and not udp/1229).

If you have a  developer account for Red Hat,  you can check https://access.redhat.com/solutions/917833

Best Regards,
Strahil Nikolov

На 9 юли 2020 г. 17:01:13 GMT+03:00, "stefan.schmitz at farmpartner-tec.com" <stefan.schmitz at farmpartner-tec.com> написа:
>Hello,
>
>thanks for the advise. I have worked through that list as follows:
>
> > -  key deployed on the Hypervisours
> > -  key deployed on the VMs
>I created the key file a while ago once on one host and distributed it 
>to every other host and guest. Right now it resides on all 4 machines
>in 
>the same path: /etc/cluster/fence_xvm.key
>Is there maybe a a corosync/Stonith or other function which checks the 
>keyfiles for any corruption or errors?
>
>
> > -  fence_virtd  running on both Hypervisours
>It is running on each host:
>#  ps aux |grep fence_virtd
>root      62032  0.0  0.0 251568  4496 ?        Ss   Jun29   0:00 
>fence_virtd
>
>
>> -  Firewall opened (1229/udp  for the hosts,  1229/tcp  for the
>guests)
>
>Command on one host:
>fence_xvm -a 225.0.0.12 -o list
>
>tcpdump on the guest residing on the other host:
>host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176
>host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 
>225.0.0.12 to_in { }]
>host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 
>225.0.0.12 to_in { }]
>
>At least to me it looks like the VMs are reachable by the multicast
>traffic.
>Additionally, no matter on which host I execute the fence_xvm command, 
>tcpdum shows the same traffic on both guests.
>But on the other hand, at the same time, tcpdump shows nothing on the 
>other host. Just to be sure I have flushed iptables beforehand on each 
>host. Is there maybe a problem?
>
>
> > -  fence_xvm on both VMs
>fence_xvm is installed on both VMs
># which fence_xvm
>/usr/sbin/fence_xvm
>
>Could you please advise on how to proceed? Thank you in advance.
>Kind regards
>Stefan Schmitz
>
>Am 08.07.2020 um 20:24 schrieb Strahil Nikolov:
>> Erm...network/firewall  is  always "green". Run tcpdump on Host1  and
>VM2  (not  on the same host).
>> Then run again 'fence_xvm  -o  list' and check what is captured.
>> 
>> In summary,  you need:
>> -  key deployed on the Hypervisours
>> -  key deployed on the VMs
>> -  fence_virtd  running on both Hypervisours
>> -  Firewall opened (1229/udp  for the hosts,  1229/tcp  for the
>guests)
>> -  fence_xvm on both VMs
>> 
>> In your case  ,  the primary suspect  is multicast  traffic.
>> 
>> Best  Regards,
>> Strahil Nikolov
>> 
>> На 8 юли 2020 г. 16:33:45 GMT+03:00,
>"stefan.schmitz at farmpartner-tec.com"
><stefan.schmitz at farmpartner-tec.com> написа:
>>> Hello,
>>>
>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>>> Ubuntu20.
>>>
>>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed
>the
>>> packages fence-virt and fence-virtd.
>>>
>>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still
>just
>>> returns the single local VM.
>>>
>>> The same command on both VMs results in:
>>> # fence_xvm -a 225.0.0.12 -o list
>>> Timed out waiting for response
>>> Operation failed
>>>
>>> But just as before, trying to connect from the guest to the host via
>nc
>>>
>>> just works fine.
>>> #nc -z -v -u 192.168.1.21 1229
>>> Connection to 192.168.1.21 1229 port [udp/*] succeeded!
>>>
>>> So the hosts and service basically is reachable.
>>>
>>> I have spoken to our Firewall tech, he has assured me, that no local
>>> traffic is hindered by anything. Be it multicast or not.
>>> Software Firewalls are not present/active on any of our servers.
>>>
>>> Ubuntu guests:
>>> # ufw status
>>> Status: inactive
>>>
>>> CentOS hosts:
>>> systemctl status firewalld
>>> ● firewalld.service - firewalld - dynamic firewall daemon
>>>   Loaded: loaded (/usr/lib/systemd/system/firewalld.service;
>disabled;
>>> vendor preset: enabled)
>>>     Active: inactive (dead)
>>>       Docs: man:firewalld(1)
>>>
>>>
>>> Any hints or help on how to remedy this problem would be greatly
>>> appreciated!
>>>
>>> Kind regards
>>> Stefan Schmitz
>>>
>>>
>>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger:
>>>> On 7/7/20 10:33 AM, Strahil Nikolov wrote:
>>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>> Ubuntu20.
>>>>>
>>>>> Your other option is to get an iSCSI from your quorum system and
>use
>>> that for SBD.
>>>>> For watchdog, you can use 'softdog' kernel module or you can use
>KVM
>>> to present one to the VMs.
>>>>> You can also check the '-P' flag for SBD.
>>>> With kvm please use the qemu-watchdog and try to
>>>> prevent using softdogwith SBD.
>>>> Especially if you are aiming for a production-cluster ...
>>>>
>>>> Adding something like that to libvirt-xml should do the trick:
>>>> <watchdog model='i6300esb' action='reset'>
>>>>         <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
>>>> function='0x0'/>
>>>> </watchdog>
>>>>
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>
>>>>> На 7 юли 2020 г. 10:11:38 GMT+03:00,
>>> "stefan.schmitz at farmpartner-tec.com"
>>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>>>>> What does 'virsh list'
>>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>>> the VMs ...
>>>>>> Yes, each host shows its own
>>>>>>
>>>>>> # virsh list
>>>>>>    Id    Name                           Status
>>>>>> ----------------------------------------------------
>>>>>>    2     kvm101                         running
>>>>>>
>>>>>> # virsh list
>>>>>>    Id    Name                           State
>>>>>> ----------------------------------------------------
>>>>>>    1     kvm102                         running
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>>> guests as well?
>>>>>> fence_xvm sadly does not work on the Ubuntu guests. The howto
>said
>>> to
>>>>>> install  "yum install fence-virt fence-virtd" which do not exist
>as
>>>>>> such
>>>>>> in Ubuntu 18.04. After we tried to find the appropiate packages
>we
>>>>>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>>>>>> something misisng or completely wrong?
>>>>>> Though we can  connect to both hosts using "nc -z -v -u
>>> 192.168.1.21
>>>>>> 1229", that just works fine.
>>>>>>
>>>> without fence-virt you can't expect the whole thing to work.
>>>> maybe you can build it for your ubuntu-version from sources of
>>>> a package for another ubuntu-version if it doesn't exist yet.
>>>> btw. which pacemaker-version are you using?
>>>> There was a convenience-fix on the master-branch for at least
>>>> a couple of days (sometimes during 2.0.4 release-cycle) that
>>>> wasn't compatible with fence_xvm.
>>>>>>> Usually,  the biggest problem is the multicast traffic - as in
>>> many
>>>>>>> environments it can be dropped  by firewalls.
>>>>>> To make sure I have requested our Datacenter techs to verify that
>>>>>> multicast Traffic can move unhindered in our local Network. But
>in
>>> the
>>>>>> past on multiple occasions they have confirmed, that local
>traffic
>>> is
>>>>>> not filtered in any way. But Since now I have never specifically
>>> asked
>>>>>> for multicast traffic, which I now did. I am waiting for an
>answer
>>> to
>>>>>> that question.
>>>>>>
>>>>>>
>>>>>> kind regards
>>>>>> Stefan Schmitz
>>>>>>
>>>>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>>>>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>>>> # fence_xvm -o list
>>>>>>>>>> kvm102
>>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>>> on
>>>>>>>>> This should show both VMs, so getting to that point will
>likely
>>>>>> solve
>>>>>>>>> your problem. fence_xvm relies on multicast, there could be
>some
>>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>> You said you tried on both hosts. What does 'virsh list'
>>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>>> the VMs ...
>>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>>> guests as well?
>>>>>>> Did you try pinging via the physical network that is
>>>>>>> connected tothe bridge configured to be used for
>>>>>>> fencing?
>>>>>>> If I got it right fence_xvm should supportcollecting
>>>>>>> answersfrom multiple hosts but I found a suggestion
>>>>>>> to do a setup with 2 multicast-addresses & keys for
>>>>>>> each host.
>>>>>>> Which route did you go?
>>>>>>>
>>>>>>> Klaus
>>>>>>>> Thank you for pointing me in that direction. We have tried to
>>> solve
>>>>>>>> that but with no success. We were using an howto provided here
>>>>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>>>>>
>>>>>>>> Problem is, it specifically states that the tutorial does not
>yet
>>>>>>>> support the case where guests are running on multiple hosts.
>>> There
>>>>>> are
>>>>>>>> some short hints what might be necessary to do, but working
>>> through
>>>>>>>> those sadly just did not work nor where there any clues which
>>> would
>>>>>>>> help us finding a solution ourselves. So now we are completely
>>> stuck
>>>>>>>> here.
>>>>>>>>
>>>>>>>> Has someone the same configuration with Guest VMs on multiple
>>> hosts?
>>>>>>>> And how did you manage to get that to work? What do we need to
>do
>>> to
>>>>>>>> resolve this? Is there maybe even someone who would be willing
>to
>>>>>> take
>>>>>>>> a closer look at our server? Any help would be greatly
>>> appreciated!
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>> Stefan Schmitz
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>>>>>> stefan.schmitz at farmpartner-tec.com
>>>>>>>>> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I hope someone can help with this problem. We are (still)
>>> trying
>>>>>> to
>>>>>>>>>> get
>>>>>>>>>> Stonith to achieve a running active/active HA Cluster, but
>>> sadly
>>>>>> to
>>>>>>>>>> no
>>>>>>>>>> avail.
>>>>>>>>>>
>>>>>>>>>> There are 2 Centos Hosts. On each one there is a virtual
>Ubuntu
>>>>>> VM.
>>>>>>>>>> The
>>>>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>>>>>
>>>>>>>>>> The current status is this:
>>>>>>>>>>
>>>>>>>>>> # pcs status
>>>>>>>>>> Cluster name: pacemaker_cluster
>>>>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs
>>> used
>>>>>> in
>>>>>>>>>> setup?)
>>>>>>>>>> Stack: corosync
>>>>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) -
>>> partition
>>>>>>>>>> with
>>>>>>>>>> quorum
>>>>>>>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>>>>>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>>>>>>>> server4ubuntu1
>>>>>>>>>>
>>>>>>>>>> 2 nodes configured
>>>>>>>>>> 13 resources configured
>>>>>>>>>>
>>>>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>>
>>>>>>>>>> Full list of resources:
>>>>>>>>>>
>>>>>>>>>>       stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>>>>>>>       Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>>>>>>           Masters: [ server4ubuntu1 ]
>>>>>>>>>>           Slaves: [ server2ubuntu1 ]
>>>>>>>>>>       Master/Slave Set: WebDataClone [WebData]
>>>>>>>>>>           Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>>       Clone Set: dlm-clone [dlm]
>>>>>>>>>>           Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>>       Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>>>>>>           ClusterIP:0        (ocf::heartbeat:IPaddr2):
>>> Started
>>>>>>>>>> server2ubuntu1
>>>>>>>>>>           ClusterIP:1        (ocf::heartbeat:IPaddr2):
>>> Started
>>>>>>>>>> server4ubuntu1
>>>>>>>>>>       Clone Set: WebFS-clone [WebFS]
>>>>>>>>>>           Started: [ server4ubuntu1 ]
>>>>>>>>>>           Stopped: [ server2ubuntu1 ]
>>>>>>>>>>       Clone Set: WebSite-clone [WebSite]
>>>>>>>>>>           Started: [ server4ubuntu1 ]
>>>>>>>>>>           Stopped: [ server2ubuntu1 ]
>>>>>>>>>>
>>>>>>>>>> Failed Actions:
>>>>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>>>>>> call=201,
>>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>>          last-rc-change='Thu Jul  2 14:37:35 2020',
>queued=0ms,
>>>>>>>>>> exec=3403ms
>>>>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>>>>>> call=203,
>>>>>>>>>> status=complete, exitreason='',
>>>>>>>>>>          last-rc-change='Thu Jul  2 14:38:39 2020',
>queued=0ms,
>>>>>> exec=0ms
>>>>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>>>>>> call=202,
>>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>>          last-rc-change='Thu Jul  2 14:37:39 2020',
>queued=0ms,
>>>>>>>>>> exec=3411ms
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>>>>>> On both hosts the command
>>>>>>>>>> # fence_xvm -o list
>>>>>>>>>> kvm102
>>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>>> on
>>>>>>>>> This should show both VMs, so getting to that point will
>likely
>>>>>> solve
>>>>>>>>> your problem. fence_xvm relies on multicast, there could be
>some
>>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>>>>
>>>>>>>>>> returns the local VM. Apparently it connects through the
>>>>>>>>>> Virtualization
>>>>>>>>>> interface because it returns the VM name not the Hostname of
>>> the
>>>>>>>>>> client
>>>>>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>>>>>
>>>>>>>>> To get pacemaker to be able to use it for fencing the cluster
>>>>>> nodes,
>>>>>>>>> you have to add a pcmk_host_map parameter to the fencing
>>> resource.
>>>>>> It
>>>>>>>>> looks like
>>> pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>>>>>
>>>>>>>>>> In the local network, every traffic is allowed. No firewall
>is
>>>>>>>>>> locally
>>>>>>>>>> active, just the connections leaving the local network are
>>>>>>>>>> firewalled.
>>>>>>>>>> Hence there are no coneection problems between the hosts and
>>>>>> clients.
>>>>>>>>>> For example we can succesfully connect from the clients to
>the
>>>>>> Hosts:
>>>>>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>>
>>>>>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On the Ubuntu VMs we created and configured the the stonith
>>>>>> resource
>>>>>>>>>> according to the  howto provided here:
>>>>>>>>>>
>>>>>>
>>>
>https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>>>>>>>
>>>>>>>>>> The actual line we used:
>>>>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1
>>> external/libvirt
>>>>>>>>>> hostlist="Host4,host2"
>>>>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But as you can see in in the pcs status output, stonith is
>>> stopped
>>>>>>>>>> and
>>>>>>>>>> exits with an unkown error.
>>>>>>>>>>
>>>>>>>>>> Can somebody please advise on how to procced or what
>additionla
>>>>>>>>>> information is needed to solve this problem?
>>>>>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>>>>>
>>>>>>>>>> Kind regards
>>>>>>>>>> Stefan Schmitz
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Manage your subscription:
>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>>>
>>>>


More information about the Users mailing list