[ClusterLabs] Still Beginner STONITH Problem
Klaus Wenninger
kwenning at redhat.com
Tue Jul 7 04:54:08 EDT 2020
On 7/7/20 10:33 AM, Strahil Nikolov wrote:
> I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20.
>
> Your other option is to get an iSCSI from your quorum system and use that for SBD.
> For watchdog, you can use 'softdog' kernel module or you can use KVM to present one to the VMs.
> You can also check the '-P' flag for SBD.
With kvm please use the qemu-watchdog and try to
prevent using softdogwith SBD.
Especially if you are aiming for a production-cluster ...
Adding something like that to libvirt-xml should do the trick:
<watchdog model='i6300esb' action='reset'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
</watchdog>
>
> Best Regards,
> Strahil Nikolov
>
> На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schmitz at farmpartner-tec.com" <stefan.schmitz at farmpartner-tec.com> написа:
>>> What does 'virsh list'
>>> give you onthe 2 hosts? Hopefully different names for
>>> the VMs ...
>> Yes, each host shows its own
>>
>> # virsh list
>> Id Name Status
>> ----------------------------------------------------
>> 2 kvm101 running
>>
>> # virsh list
>> Id Name State
>> ----------------------------------------------------
>> 1 kvm102 running
>>
>>
>>
>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>> guests as well?
>> fence_xvm sadly does not work on the Ubuntu guests. The howto said to
>> install "yum install fence-virt fence-virtd" which do not exist as
>> such
>> in Ubuntu 18.04. After we tried to find the appropiate packages we
>> installed "libvirt-clients" and "multipath-tools". Is there maybe
>> something misisng or completely wrong?
>> Though we can connect to both hosts using "nc -z -v -u 192.168.1.21
>> 1229", that just works fine.
>>
without fence-virt you can't expect the whole thing to work.
maybe you can build it for your ubuntu-version from sources of
a package for another ubuntu-version if it doesn't exist yet.
btw. which pacemaker-version are you using?
There was a convenience-fix on the master-branch for at least
a couple of days (sometimes during 2.0.4 release-cycle) that
wasn't compatible with fence_xvm.
>>> Usually, the biggest problem is the multicast traffic - as in many
>>> environments it can be dropped by firewalls.
>> To make sure I have requested our Datacenter techs to verify that
>> multicast Traffic can move unhindered in our local Network. But in the
>> past on multiple occasions they have confirmed, that local traffic is
>> not filtered in any way. But Since now I have never specifically asked
>> for multicast traffic, which I now did. I am waiting for an answer to
>> that question.
>>
>>
>> kind regards
>> Stefan Schmitz
>>
>> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger:
>>> On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>> Hello,
>>>>
>>>>>> # fence_xvm -o list
>>>>>> kvm102
>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>> on
>>>>> This should show both VMs, so getting to that point will likely
>> solve
>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>> obscure network configuration to get that working on the VMs.
>>> You said you tried on both hosts. What does 'virsh list'
>>> give you onthe 2 hosts? Hopefully different names for
>>> the VMs ...
>>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>>> guests as well?
>>> Did you try pinging via the physical network that is
>>> connected tothe bridge configured to be used for
>>> fencing?
>>> If I got it right fence_xvm should supportcollecting
>>> answersfrom multiple hosts but I found a suggestion
>>> to do a setup with 2 multicast-addresses & keys for
>>> each host.
>>> Which route did you go?
>>>
>>> Klaus
>>>> Thank you for pointing me in that direction. We have tried to solve
>>>> that but with no success. We were using an howto provided here
>>>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>>>
>>>> Problem is, it specifically states that the tutorial does not yet
>>>> support the case where guests are running on multiple hosts. There
>> are
>>>> some short hints what might be necessary to do, but working through
>>>> those sadly just did not work nor where there any clues which would
>>>> help us finding a solution ourselves. So now we are completely stuck
>>>> here.
>>>>
>>>> Has someone the same configuration with Guest VMs on multiple hosts?
>>>> And how did you manage to get that to work? What do we need to do to
>>>> resolve this? Is there maybe even someone who would be willing to
>> take
>>>> a closer look at our server? Any help would be greatly appreciated!
>>>>
>>>> Kind regards
>>>> Stefan Schmitz
>>>>
>>>>
>>>>
>>>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>>>> On Thu, 2020-07-02 at 17:18 +0200,
>> stefan.schmitz at farmpartner-tec.com
>>>>> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I hope someone can help with this problem. We are (still) trying
>> to
>>>>>> get
>>>>>> Stonith to achieve a running active/active HA Cluster, but sadly
>> to
>>>>>> no
>>>>>> avail.
>>>>>>
>>>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu
>> VM.
>>>>>> The
>>>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>>>
>>>>>> The current status is this:
>>>>>>
>>>>>> # pcs status
>>>>>> Cluster name: pacemaker_cluster
>>>>>> WARNING: corosync and pacemaker node names do not match (IPs used
>> in
>>>>>> setup?)
>>>>>> Stack: corosync
>>>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>>>>>> with
>>>>>> quorum
>>>>>> Last updated: Thu Jul 2 17:03:53 2020
>>>>>> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on
>>>>>> server4ubuntu1
>>>>>>
>>>>>> 2 nodes configured
>>>>>> 13 resources configured
>>>>>>
>>>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>>>
>>>>>> Full list of resources:
>>>>>>
>>>>>> stonith_id_1 (stonith:external/libvirt): Stopped
>>>>>> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>>> Masters: [ server4ubuntu1 ]
>>>>>> Slaves: [ server2ubuntu1 ]
>>>>>> Master/Slave Set: WebDataClone [WebData]
>>>>>> Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>>> Clone Set: dlm-clone [dlm]
>>>>>> Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>>> Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started
>>>>>> server2ubuntu1
>>>>>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started
>>>>>> server4ubuntu1
>>>>>> Clone Set: WebFS-clone [WebFS]
>>>>>> Started: [ server4ubuntu1 ]
>>>>>> Stopped: [ server2ubuntu1 ]
>>>>>> Clone Set: WebSite-clone [WebSite]
>>>>>> Started: [ server4ubuntu1 ]
>>>>>> Stopped: [ server2ubuntu1 ]
>>>>>>
>>>>>> Failed Actions:
>>>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>>>> call=201,
>>>>>> status=Error, exitreason='',
>>>>>> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms,
>>>>>> exec=3403ms
>>>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>>>> call=203,
>>>>>> status=complete, exitreason='',
>>>>>> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms,
>> exec=0ms
>>>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>>>> call=202,
>>>>>> status=Error, exitreason='',
>>>>>> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms,
>>>>>> exec=3411ms
>>>>>>
>>>>>>
>>>>>> The stonith resoursce is stopped and does not seem to work.
>>>>>> On both hosts the command
>>>>>> # fence_xvm -o list
>>>>>> kvm102
>> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>>>> on
>>>>> This should show both VMs, so getting to that point will likely
>> solve
>>>>> your problem. fence_xvm relies on multicast, there could be some
>>>>> obscure network configuration to get that working on the VMs.
>>>>>
>>>>>> returns the local VM. Apparently it connects through the
>>>>>> Virtualization
>>>>>> interface because it returns the VM name not the Hostname of the
>>>>>> client
>>>>>> VM. I do not know if this is how it is supposed to work?
>>>>> Yes, fence_xvm knows only about the VM names.
>>>>>
>>>>> To get pacemaker to be able to use it for fencing the cluster
>> nodes,
>>>>> you have to add a pcmk_host_map parameter to the fencing resource.
>> It
>>>>> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>>>
>>>>>> In the local network, every traffic is allowed. No firewall is
>>>>>> locally
>>>>>> active, just the connections leaving the local network are
>>>>>> firewalled.
>>>>>> Hence there are no coneection problems between the hosts and
>> clients.
>>>>>> For example we can succesfully connect from the clients to the
>> Hosts:
>>>>>> # nc -z -v -u 192.168.1.21 1229
>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>> Ncat: Connected to 192.168.1.21:1229.
>>>>>> Ncat: UDP packet sent successfully
>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>
>>>>>> # nc -z -v -u 192.168.1.13 1229
>>>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>>>> Ncat: Connected to 192.168.1.13:1229.
>>>>>> Ncat: UDP packet sent successfully
>>>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>>>
>>>>>>
>>>>>> On the Ubuntu VMs we created and configured the the stonith
>> resource
>>>>>> according to the howto provided here:
>>>>>>
>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>>>
>>>>>> The actual line we used:
>>>>>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
>>>>>> hostlist="Host4,host2"
>>>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>>>
>>>>>>
>>>>>> But as you can see in in the pcs status output, stonith is stopped
>>>>>> and
>>>>>> exits with an unkown error.
>>>>>>
>>>>>> Can somebody please advise on how to procced or what additionla
>>>>>> information is needed to solve this problem?
>>>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>>>
>>>>>> Kind regards
>>>>>> Stefan Schmitz
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
More information about the Users
mailing list