[ClusterLabs] Still Beginner STONITH Problem

stefan.schmitz at farmpartner-tec.com stefan.schmitz at farmpartner-tec.com
Wed Jul 8 09:33:45 EDT 2020


Hello,

 >I can't find fence_virtd for Ubuntu18, but it is available for >Ubuntu20.

We have now upgraded our Server to Ubuntu 20.04 LTS and installed the 
packages fence-virt and fence-virtd.

The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just 
returns the single local VM.

The same command on both VMs results in:
# fence_xvm -a 225.0.0.12 -o list
Timed out waiting for response
Operation failed

But just as before, trying to connect from the guest to the host via nc 
just works fine.
#nc -z -v -u 192.168.1.21 1229
Connection to 192.168.1.21 1229 port [udp/*] succeeded!

So the hosts and service basically is reachable.

I have spoken to our Firewall tech, he has assured me, that no local 
traffic is hindered by anything. Be it multicast or not.
Software Firewalls are not present/active on any of our servers.

Ubuntu guests:
# ufw status
Status: inactive

CentOS hosts:
systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; 
vendor preset: enabled)
    Active: inactive (dead)
      Docs: man:firewalld(1)


Any hints or help on how to remedy this problem would be greatly 
appreciated!

Kind regards
Stefan Schmitz


Am 07.07.2020 um 10:54 schrieb Klaus Wenninger:
> On 7/7/20 10:33 AM, Strahil Nikolov wrote:
>> I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20.
>>
>> Your other option is to get an iSCSI from your quorum system and use that for SBD.
>> For watchdog, you can use 'softdog' kernel module or you can use KVM to present one to the VMs.
>> You can also check the '-P' flag for SBD.
> With kvm please use the qemu-watchdog and try to
> prevent using softdogwith SBD.
> Especially if you are aiming for a production-cluster ...
> 
> Adding something like that to libvirt-xml should do the trick:
> <watchdog model='i6300esb' action='reset'>
>        <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> function='0x0'/>
> </watchdog>
> 
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schmitz at farmpartner-tec.com" <stefan.schmitz at farmpartner-tec.com> написа:
>>>> What does 'virsh list'
>>>> give you onthe 2 hosts? Hopefully different names for
>>>> the VMs ...
>>> Yes, each host shows its own
>>>
>>> # virsh list
>>>   Id    Name                           Status
>>> ----------------------------------------------------
>>>   2     kvm101                         running
>>>
>>> # virsh list
>>>   Id    Name                           State
>>> ----------------------------------------------------
>>>   1     kvm102                         running
>>>
>>>
>>>
>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>> guests as well?
>>> fence_xvm sadly does not work on the Ubuntu guests. The howto said to
>>> install  "yum install fence-virt fence-virtd" which do not exist as
>>> such
>>> in Ubuntu 18.04. After we tried to find the appropiate packages we
>>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>>> something misisng or completely wrong?
>>> Though we can  connect to both hosts using "nc -z -v -u 192.168.1.21
>>> 1229", that just works fine.
>>>
> without fence-virt you can't expect the whole thing to work.
> maybe you can build it for your ubuntu-version from sources of
> a package for another ubuntu-version if it doesn't exist yet.
> btw. which pacemaker-version are you using?
> There was a convenience-fix on the master-branch for at least
> a couple of days (sometimes during 2.0.4 release-cycle) that
> wasn't compatible with fence_xvm.
>>>> Usually,  the biggest problem is the multicast traffic - as in many
>>>> environments it can be dropped  by firewalls.
>>> To make sure I have requested our Datacenter techs to verify that
>>> multicast Traffic can move unhindered in our local Network. But in the
>>> past on multiple occasions they have confirmed, that local traffic is
>>> not filtered in any way. But Since now I have never specifically asked
>>> for multicast traffic, which I now did. I am waiting for an answer to
>>> that question.
>>>
>>>
>>> kind regards
>>> Stefan Schmitz
>>>
>>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>> Hello,
>>>>>
>>>>>>> # fence_xvm -o list
>>>>>>> kvm102
>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>> on
>>>>>> This should show both VMs, so getting to that point will likely
>>> solve
>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>> obscure network configuration to get that working on the VMs.
>>>> You said you tried on both hosts. What does 'virsh list'
>>>> give you onthe 2 hosts? Hopefully different names for
>>>> the VMs ...
>>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>>> guests as well?
>>>> Did you try pinging via the physical network that is
>>>> connected tothe bridge configured to be used for
>>>> fencing?
>>>> If I got it right fence_xvm should supportcollecting
>>>> answersfrom multiple hosts but I found a suggestion
>>>> to do a setup with 2 multicast-addresses & keys for
>>>> each host.
>>>> Which route did you go?
>>>>
>>>> Klaus
>>>>> Thank you for pointing me in that direction. We have tried to solve
>>>>> that but with no success. We were using an howto provided here
>>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>>
>>>>> Problem is, it specifically states that the tutorial does not yet
>>>>> support the case where guests are running on multiple hosts. There
>>> are
>>>>> some short hints what might be necessary to do, but working through
>>>>> those sadly just did not work nor where there any clues which would
>>>>> help us finding a solution ourselves. So now we are completely stuck
>>>>> here.
>>>>>
>>>>> Has someone the same configuration with Guest VMs on multiple hosts?
>>>>> And how did you manage to get that to work? What do we need to do to
>>>>> resolve this? Is there maybe even someone who would be willing to
>>> take
>>>>> a closer look at our server? Any help would be greatly appreciated!
>>>>>
>>>>> Kind regards
>>>>> Stefan Schmitz
>>>>>
>>>>>
>>>>>
>>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>>> stefan.schmitz at farmpartner-tec.com
>>>>>> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I hope someone can help with this problem. We are (still) trying
>>> to
>>>>>>> get
>>>>>>> Stonith to achieve a running active/active HA Cluster, but sadly
>>> to
>>>>>>> no
>>>>>>> avail.
>>>>>>>
>>>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>>> VM.
>>>>>>> The
>>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>>
>>>>>>> The current status is this:
>>>>>>>
>>>>>>> # pcs status
>>>>>>> Cluster name: pacemaker_cluster
>>>>>>> WARNING: corosync and pacemaker node names do not match (IPs used
>>> in
>>>>>>> setup?)
>>>>>>> Stack: corosync
>>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>>>>>>> with
>>>>>>> quorum
>>>>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>>>>> server4ubuntu1
>>>>>>>
>>>>>>> 2 nodes configured
>>>>>>> 13 resources configured
>>>>>>>
>>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>
>>>>>>> Full list of resources:
>>>>>>>
>>>>>>>      stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>>>>      Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>>>          Masters: [ server4ubuntu1 ]
>>>>>>>          Slaves: [ server2ubuntu1 ]
>>>>>>>      Master/Slave Set: WebDataClone [WebData]
>>>>>>>          Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>      Clone Set: dlm-clone [dlm]
>>>>>>>          Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>>      Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>>>          ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
>>>>>>> server2ubuntu1
>>>>>>>          ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
>>>>>>> server4ubuntu1
>>>>>>>      Clone Set: WebFS-clone [WebFS]
>>>>>>>          Started: [ server4ubuntu1 ]
>>>>>>>          Stopped: [ server2ubuntu1 ]
>>>>>>>      Clone Set: WebSite-clone [WebSite]
>>>>>>>          Started: [ server4ubuntu1 ]
>>>>>>>          Stopped: [ server2ubuntu1 ]
>>>>>>>
>>>>>>> Failed Actions:
>>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>>> call=201,
>>>>>>> status=Error, exitreason='',
>>>>>>>         last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>>>>>> exec=3403ms
>>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>>> call=203,
>>>>>>> status=complete, exitreason='',
>>>>>>>         last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms,
>>> exec=0ms
>>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>>> call=202,
>>>>>>> status=Error, exitreason='',
>>>>>>>         last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>>>>>> exec=3411ms
>>>>>>>
>>>>>>>
>>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>>> On both hosts the command
>>>>>>> # fence_xvm -o list
>>>>>>> kvm102
>>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>>> on
>>>>>> This should show both VMs, so getting to that point will likely
>>> solve
>>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>>> obscure network configuration to get that working on the VMs.
>>>>>>
>>>>>>> returns the local VM. Apparently it connects through the
>>>>>>> Virtualization
>>>>>>> interface because it returns the VM name not the Hostname of the
>>>>>>> client
>>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>>
>>>>>> To get pacemaker to be able to use it for fencing the cluster
>>> nodes,
>>>>>> you have to add a pcmk_host_map parameter to the fencing resource.
>>> It
>>>>>> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>>
>>>>>>> In the local network, every traffic is allowed. No firewall is
>>>>>>> locally
>>>>>>> active, just the connections leaving the local network are
>>>>>>> firewalled.
>>>>>>> Hence there are no coneection problems between the hosts and
>>> clients.
>>>>>>> For example we can succesfully connect from the clients to the
>>> Hosts:
>>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>>> Ncat: UDP packet sent successfully
>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>
>>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>>> Ncat: UDP packet sent successfully
>>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>>
>>>>>>>
>>>>>>> On the Ubuntu VMs we created and configured the the stonith
>>> resource
>>>>>>> according to the  howto provided here:
>>>>>>>
>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>>>>
>>>>>>> The actual line we used:
>>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
>>>>>>> hostlist="Host4,host2"
>>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>>
>>>>>>>
>>>>>>> But as you can see in in the pcs status output, stonith is stopped
>>>>>>> and
>>>>>>> exits with an unkown error.
>>>>>>>
>>>>>>> Can somebody please advise on how to procced or what additionla
>>>>>>> information is needed to solve this problem?
>>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Stefan Schmitz
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>
> 


More information about the Users mailing list