[ClusterLabs] Still Beginner STONITH Problem

stefan.schmitz at farmpartner-tec.com stefan.schmitz at farmpartner-tec.com
Thu Jul 9 11:17:39 EDT 2020


Hello,

 > Well, theory still holds I would say.
 >
 > I guess that the multicast-traffic from the other host
 > or the guestsdoesn't get to the daemon on the host.
 > Can't you just simply check if there are any firewall
 > rules configuredon the host kernel?

I hope I did understand you corretcly and you are referring to iptables?
Here is the output of the current rules. Besides the IP of the guest the 
output is identical on both hosts:

# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT

# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
SOLUSVM_TRAFFIC_IN  all  --  anywhere             anywhere
SOLUSVM_TRAFFIC_OUT  all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain SOLUSVM_TRAFFIC_IN (1 references)
target     prot opt source               destination
            all  --  anywhere             192.168.1.14

Chain SOLUSVM_TRAFFIC_OUT (1 references)
target     prot opt source               destination
            all  --  192.168.1.14         anywhere

kind regards
Stefan Schmitz


Am 09.07.2020 um 16:30 schrieb Klaus Wenninger:
> On 7/9/20 4:01 PM, stefan.schmitz at farmpartner-tec.com wrote:
>> Hello,
>>
>> thanks for the advise. I have worked through that list as follows:
>>
>>> -  key deployed on the Hypervisours
>>> -  key deployed on the VMs
>> I created the key file a while ago once on one host and distributed it
>> to every other host and guest. Right now it resides on all 4 machines
>> in the same path: /etc/cluster/fence_xvm.key
>> Is there maybe a a corosync/Stonith or other function which checks the
>> keyfiles for any corruption or errors?
>>
>>
>>> -  fence_virtd  running on both Hypervisours
>> It is running on each host:
>> #  ps aux |grep fence_virtd
>> root      62032  0.0  0.0 251568  4496 ?        Ss   Jun29   0:00
>> fence_virtd
>>
>>
>>> -  Firewall opened (1229/udp  for the hosts,  1229/tcp  for the guests)
>>
>> Command on one host:
>> fence_xvm -a 225.0.0.12 -o list
>>
>> tcpdump on the guest residing on the other host:
>> host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176
>> host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>> 225.0.0.12 to_in { }]
>> host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>> 225.0.0.12 to_in { }]
>>
>> At least to me it looks like the VMs are reachable by the multicast
>> traffic.
>> Additionally, no matter on which host I execute the fence_xvm command,
>> tcpdum shows the same traffic on both guests.
>> But on the other hand, at the same time, tcpdump shows nothing on the
>> other host. Just to be sure I have flushed iptables beforehand on each
>> host. Is there maybe a problem?
> Well, theory still holds I would say.
> 
> I guess that the multicast-traffic from the other host
> or the guestsdoesn't get to the daemon on the host.
> Can't you just simply check if there are any firewall
> rules configuredon the host kernel?
>>
>>
>>> -  fence_xvm on both VMs
>> fence_xvm is installed on both VMs
>> # which fence_xvm
>> /usr/sbin/fence_xvm
>>
>> Could you please advise on how to proceed? Thank you in advance.
>> Kind regards
>> Stefan Schmitz
>>
>> Am 08.07.2020 um 20:24 schrieb Strahil Nikolov:
>>> Erm...network/firewall  is  always "green". Run tcpdump on Host1  and
>>> VM2  (not  on the same host).
>>> Then run again 'fence_xvm  -o  list' and check what is captured.
>>>
>>> In summary,  you need:
>>> -  key deployed on the Hypervisours
>>> -  key deployed on the VMs
>>> -  fence_virtd  running on both Hypervisours
>>> -  Firewall opened (1229/udp  for the hosts,  1229/tcp  for the guests)
>>> -  fence_xvm on both VMs
>>>
>>> In your case  ,  the primary suspect  is multicast  traffic.
>>>
>>> Best  Regards,
>>> Strahil Nikolov
>>>
>>> На 8 юли 2020 г. 16:33:45 GMT+03:00,
>>> "stefan.schmitz at farmpartner-tec.com"
>>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>> Hello,
>>>>
>>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>>>> Ubuntu20.
>>>>
>>>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed the
>>>> packages fence-virt and fence-virtd.
>>>>
>>>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just
>>>> returns the single local VM.
>>>>
>>>> The same command on both VMs results in:
>>>> # fence_xvm -a 225.0.0.12 -o list
>>>> Timed out waiting for response
>>>> Operation failed
>>>>
>>>> But just as before, trying to connect from the guest to the host via nc
>>>>
>>>> just works fine.
>>>> #nc -z -v -u 192.168.1.21 1229
>>>> Connection to 192.168.1.21 1229 port [udp/*] succeeded!
>>>>
>>>> So the hosts and service basically is reachable.
>>>>
>>>> I have spoken to our Firewall tech, he has assured me, that no local
>>>> traffic is hindered by anything. Be it multicast or not.
>>>> Software Firewalls are not present/active on any of our servers.
>>>>
>>>> Ubuntu guests:
>>>> # ufw status
>>>> Status: inactive
>>>>
>>>> CentOS hosts:
>>>> systemctl status firewalld
>>>> ● firewalld.service - firewalld - dynamic firewall daemon
>>>>    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled;
>>>> vendor preset: enabled)
>>>>      Active: inactive (dead)
>>>>        Docs: man:firewalld(1)
>>>>
>>>>
>>>> Any hints or help on how to remedy this problem would be greatly
>>>> appreciated!
>>>>
>>>> Kind regards
>>>> Stefan Schmitz
>>>>
>>>>
>>>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger:
>>>>> On 7/7/20 10:33 AM, Strahil Nikolov wrote:
>>>>>> I can't find fence_virtd for Ubuntu18, but it is available for
>>>> Ubuntu20.
>>>>>>
>>>>>> Your other option is to get an iSCSI from your quorum system and use
>>>> that for SBD.
>>>>>> For watchdog, you can use 'softdog' kernel module or you can use KVM
>>>> to present one to the VMs.
>>>>>> You can also check the '-P' flag for SBD.
>>>>> With kvm please use the qemu-watchdog and try to
>>>>> prevent using softdogwith SBD.
>>>>> Especially if you are aiming for a production-cluster ...
>>>>>
>>>>> Adding something like that to libvirt-xml should do the trick:
>>>>> <watchdog model='i6300esb' action='reset'>
>>>>>          <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
>>>>> function='0x0'/>
>>>>> </watchdog>
>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Strahil Nikolov
>>>>>>
>>>>>> На 7 юли 2020 г. 10:11:38 GMT+03:00,
>>>> "stefan.schmitz at farmpartner-tec.com"
>>>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>>>>>> What does 'virsh list'
>>>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>>>> the VMs ...
>>>>>>> Yes, each host shows its own
>>>>>>>
>>>>>>> # virsh list
>>>>>>>     Id    Name                           Status
>>>>>>> ----------------------------------------------------
>>>>>>>     2     kvm101                         running
>>>>>>>
>>>>>>> # virsh list
>>>>>>>     Id    Name                           State
>>>>>>> ----------------------------------------------------
>>>>>>>     1     kvm102                         running
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>>>> guests as well?
>>>>>>> fence_xvm sadly does not work on the Ubuntu guests. The howto said
>>>> to
>>>>>>> install  "yum install fence-virt fence-virtd" which do not exist as
>>>>>>> such
>>>>>>> in Ubuntu 18.04. After we tried to find the appropiate packages we
>>>>>>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>>>>>>> something misisng or completely wrong?
>>>>>>> Though we can  connect to both hosts using "nc -z -v -u
>>>> 192.168.1.21
>>>>>>> 1229", that just works fine.
>>>>>>>
>>>>> without fence-virt you can't expect the whole thing to work.
>>>>> maybe you can build it for your ubuntu-version from sources of
>>>>> a package for another ubuntu-version if it doesn't exist yet.
>>>>> btw. which pacemaker-version are you using?
>>>>> There was a convenience-fix on the master-branch for at least
>>>>> a couple of days (sometimes during 2.0.4 release-cycle) that
>>>>> wasn't compatible with fence_xvm.
>>>>>>>> Usually,  the biggest problem is the multicast traffic - as in
>>>> many
>>>>>>>> environments it can be dropped  by firewalls.
>>>>>>> To make sure I have requested our Datacenter techs to verify that
>>>>>>> multicast Traffic can move unhindered in our local Network. But in
>>>> the
>>>>>>> past on multiple occasions they have confirmed, that local traffic
>>>> is
>>>>>>> not filtered in any way. But Since now I have never specifically
>>>> asked
>>>>>>> for multicast traffic, which I now did. I am waiting for an answer
>>>> to
>>>>>>> that question.
>>>>>>>
>>>>>>>
>>>>>>> kind regards
>>>>>>> Stefan Schmitz
>>>>>>>
>>>>>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>>>>>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>>>> # fence_xvm -o list
>>>>>>>>>>> kvm102
>>>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>>>> on
>>>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>>>> solve
>>>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>>> You said you tried on both hosts. What does 'virsh list'
>>>>>>>> give you onthe 2 hosts? Hopefully different names for
>>>>>>>> the VMs ...
>>>>>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>>>>>> guests as well?
>>>>>>>> Did you try pinging via the physical network that is
>>>>>>>> connected tothe bridge configured to be used for
>>>>>>>> fencing?
>>>>>>>> If I got it right fence_xvm should supportcollecting
>>>>>>>> answersfrom multiple hosts but I found a suggestion
>>>>>>>> to do a setup with 2 multicast-addresses & keys for
>>>>>>>> each host.
>>>>>>>> Which route did you go?
>>>>>>>>
>>>>>>>> Klaus
>>>>>>>>> Thank you for pointing me in that direction. We have tried to
>>>> solve
>>>>>>>>> that but with no success. We were using an howto provided here
>>>>>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>>>>>>
>>>>>>>>> Problem is, it specifically states that the tutorial does not yet
>>>>>>>>> support the case where guests are running on multiple hosts.
>>>> There
>>>>>>> are
>>>>>>>>> some short hints what might be necessary to do, but working
>>>> through
>>>>>>>>> those sadly just did not work nor where there any clues which
>>>> would
>>>>>>>>> help us finding a solution ourselves. So now we are completely
>>>> stuck
>>>>>>>>> here.
>>>>>>>>>
>>>>>>>>> Has someone the same configuration with Guest VMs on multiple
>>>> hosts?
>>>>>>>>> And how did you manage to get that to work? What do we need to do
>>>> to
>>>>>>>>> resolve this? Is there maybe even someone who would be willing to
>>>>>>> take
>>>>>>>>> a closer look at our server? Any help would be greatly
>>>> appreciated!
>>>>>>>>>
>>>>>>>>> Kind regards
>>>>>>>>> Stefan Schmitz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>>>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>>>>>>> stefan.schmitz at farmpartner-tec.com
>>>>>>>>>> wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I hope someone can help with this problem. We are (still)
>>>> trying
>>>>>>> to
>>>>>>>>>>> get
>>>>>>>>>>> Stonith to achieve a running active/active HA Cluster, but
>>>> sadly
>>>>>>> to
>>>>>>>>>>> no
>>>>>>>>>>> avail.
>>>>>>>>>>>
>>>>>>>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>>>>>>> VM.
>>>>>>>>>>> The
>>>>>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>>>>>>
>>>>>>>>>>> The current status is this:
>>>>>>>>>>>
>>>>>>>>>>> # pcs status
>>>>>>>>>>> Cluster name: pacemaker_cluster
>>>>>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs
>>>> used
>>>>>>> in
>>>>>>>>>>> setup?)
>>>>>>>>>>> Stack: corosync
>>>>>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) -
>>>> partition
>>>>>>>>>>> with
>>>>>>>>>>> quorum
>>>>>>>>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>>>>>>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>>>>>>>>> server4ubuntu1
>>>>>>>>>>>
>>>>>>>>>>> 2 nodes configured
>>>>>>>>>>> 13 resources configured
>>>>>>>>>>>
>>>>>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>>>
>>>>>>>>>>> Full list of resources:
>>>>>>>>>>>
>>>>>>>>>>>        stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>>>>>>>>        Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>>>>>>>            Masters: [ server4ubuntu1 ]
>>>>>>>>>>>            Slaves: [ server2ubuntu1 ]
>>>>>>>>>>>        Master/Slave Set: WebDataClone [WebData]
>>>>>>>>>>>            Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>>>        Clone Set: dlm-clone [dlm]
>>>>>>>>>>>            Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>>>>>        Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>>>>>>>            ClusterIP:0        (ocf::heartbeat:IPaddr2):
>>>> Started
>>>>>>>>>>> server2ubuntu1
>>>>>>>>>>>            ClusterIP:1        (ocf::heartbeat:IPaddr2):
>>>> Started
>>>>>>>>>>> server4ubuntu1
>>>>>>>>>>>        Clone Set: WebFS-clone [WebFS]
>>>>>>>>>>>            Started: [ server4ubuntu1 ]
>>>>>>>>>>>            Stopped: [ server2ubuntu1 ]
>>>>>>>>>>>        Clone Set: WebSite-clone [WebSite]
>>>>>>>>>>>            Started: [ server4ubuntu1 ]
>>>>>>>>>>>            Stopped: [ server2ubuntu1 ]
>>>>>>>>>>>
>>>>>>>>>>> Failed Actions:
>>>>>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>>>>>>> call=201,
>>>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>>>           last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>>>>>>>>>> exec=3403ms
>>>>>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>>>>>>> call=203,
>>>>>>>>>>> status=complete, exitreason='',
>>>>>>>>>>>           last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms,
>>>>>>> exec=0ms
>>>>>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>>>>>>> call=202,
>>>>>>>>>>> status=Error, exitreason='',
>>>>>>>>>>>           last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>>>>>>>>>> exec=3411ms
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>>>>>>> On both hosts the command
>>>>>>>>>>> # fence_xvm -o list
>>>>>>>>>>> kvm102
>>>>>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>>>>>> on
>>>>>>>>>> This should show both VMs, so getting to that point will likely
>>>>>>> solve
>>>>>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>>>>>
>>>>>>>>>>> returns the local VM. Apparently it connects through the
>>>>>>>>>>> Virtualization
>>>>>>>>>>> interface because it returns the VM name not the Hostname of
>>>> the
>>>>>>>>>>> client
>>>>>>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>>>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>>>>>>
>>>>>>>>>> To get pacemaker to be able to use it for fencing the cluster
>>>>>>> nodes,
>>>>>>>>>> you have to add a pcmk_host_map parameter to the fencing
>>>> resource.
>>>>>>> It
>>>>>>>>>> looks like
>>>> pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>>>>>>
>>>>>>>>>>> In the local network, every traffic is allowed. No firewall is
>>>>>>>>>>> locally
>>>>>>>>>>> active, just the connections leaving the local network are
>>>>>>>>>>> firewalled.
>>>>>>>>>>> Hence there are no coneection problems between the hosts and
>>>>>>> clients.
>>>>>>>>>>> For example we can succesfully connect from the clients to the
>>>>>>> Hosts:
>>>>>>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>>>
>>>>>>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>>>>>>> Ncat: UDP packet sent successfully
>>>>>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On the Ubuntu VMs we created and configured the the stonith
>>>>>>> resource
>>>>>>>>>>> according to the  howto provided here:
>>>>>>>>>>>
>>>>>>>
>>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>
>>>>>>>>>>>
>>>>>>>>>>> The actual line we used:
>>>>>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1
>>>> external/libvirt
>>>>>>>>>>> hostlist="Host4,host2"
>>>>>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But as you can see in in the pcs status output, stonith is
>>>> stopped
>>>>>>>>>>> and
>>>>>>>>>>> exits with an unkown error.
>>>>>>>>>>>
>>>>>>>>>>> Can somebody please advise on how to procced or what additionla
>>>>>>>>>>> information is needed to solve this problem?
>>>>>>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>>>>>>
>>>>>>>>>>> Kind regards
>>>>>>>>>>> Stefan Schmitz
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Manage your subscription:
>>>>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>>>>
>>>>>
>>
> 


More information about the Users mailing list