[ClusterLabs] Still Beginner STONITH Problem

Strahil Nikolov hunter86_bg at yahoo.com
Mon Jul 6 07:52:16 EDT 2020


As far as  I know  fence_xvm supports multiple  hosts, but you need  to open the port on both Hypervisour  (udp)  and Guest (tcp). 'fence_xvm -o list' should provide  a list of VMs from all hosts that responded (and have the key).
Usually,  the biggest problem is the multicast traffic - as in many environments it can be dropped  by firewalls.

Best  Regards,
Strahil Nikolov

На 6 юли 2020 г. 12:24:08 GMT+03:00, Klaus Wenninger <kwenning at redhat.com> написа:
>On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
>> Hello,
>>
>> >> # fence_xvm -o list
>> >> kvm102                          
>bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>> >> on
>>
>> >This should show both VMs, so getting to that point will likely
>solve
>> >your problem. fence_xvm relies on multicast, there could be some
>> >obscure network configuration to get that working on the VMs.
>You said you tried on both hosts. What does 'virsh list'
>give you onthe 2 hosts? Hopefully different names for
>the VMs ...
>Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>guests as well?
>Did you try pinging via the physical network that is
>connected tothe bridge configured to be used for
>fencing?
>If I got it right fence_xvm should supportcollecting
>answersfrom multiple hosts but I found a suggestion
>to do a setup with 2 multicast-addresses & keys for
>each host.
>Which route did you go?
>
>Klaus
>>
>> Thank you for pointing me in that direction. We have tried to solve
>> that but with no success. We were using an howto provided here
>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>
>> Problem is, it specifically states that the tutorial does not yet
>> support the case where guests are running on multiple hosts. There
>are
>> some short hints what might be necessary to do, but working through
>> those sadly just did not work nor where there any clues which would
>> help us finding a solution ourselves. So now we are completely stuck
>> here.
>>
>> Has someone the same configuration with Guest VMs on multiple hosts?
>> And how did you manage to get that to work? What do we need to do to
>> resolve this? Is there maybe even someone who would be willing to
>take
>> a closer look at our server? Any help would be greatly appreciated!
>>
>> Kind regards
>> Stefan Schmitz
>>
>>
>>
>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>> On Thu, 2020-07-02 at 17:18 +0200,
>stefan.schmitz at farmpartner-tec.com
>>> wrote:
>>>> Hello,
>>>>
>>>> I hope someone can help with this problem. We are (still) trying to
>>>> get
>>>> Stonith to achieve a running active/active HA Cluster, but sadly to
>>>> no
>>>> avail.
>>>>
>>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
>>>> The
>>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>>
>>>> The current status is this:
>>>>
>>>> # pcs status
>>>> Cluster name: pacemaker_cluster
>>>> WARNING: corosync and pacemaker node names do not match (IPs used
>in
>>>> setup?)
>>>> Stack: corosync
>>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>>>> with
>>>> quorum
>>>> Last updated: Thu Jul  2 17:03:53 2020
>>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>>> server4ubuntu1
>>>>
>>>> 2 nodes configured
>>>> 13 resources configured
>>>>
>>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>>
>>>> Full list of resources:
>>>>
>>>>    stonith_id_1   (stonith:external/libvirt):     Stopped
>>>>    Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>>        Masters: [ server4ubuntu1 ]
>>>>        Slaves: [ server2ubuntu1 ]
>>>>    Master/Slave Set: WebDataClone [WebData]
>>>>        Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>>    Clone Set: dlm-clone [dlm]
>>>>        Started: [ server2ubuntu1 server4ubuntu1 ]
>>>>    Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>>        ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
>>>> server2ubuntu1
>>>>        ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
>>>> server4ubuntu1
>>>>    Clone Set: WebFS-clone [WebFS]
>>>>        Started: [ server4ubuntu1 ]
>>>>        Stopped: [ server2ubuntu1 ]
>>>>    Clone Set: WebSite-clone [WebSite]
>>>>        Started: [ server4ubuntu1 ]
>>>>        Stopped: [ server2ubuntu1 ]
>>>>
>>>> Failed Actions:
>>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>>> call=201,
>>>> status=Error, exitreason='',
>>>>       last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>>> exec=3403ms
>>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>>> call=203,
>>>> status=complete, exitreason='',
>>>>       last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms,
>exec=0ms
>>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>>> call=202,
>>>> status=Error, exitreason='',
>>>>       last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>>> exec=3411ms
>>>>
>>>>
>>>> The stonith resoursce is stopped and does not seem to work.
>>>> On both hosts the command
>>>> # fence_xvm -o list
>>>> kvm102                          
>bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>>> on
>>>
>>> This should show both VMs, so getting to that point will likely
>solve
>>> your problem. fence_xvm relies on multicast, there could be some
>>> obscure network configuration to get that working on the VMs.
>>>
>>>> returns the local VM. Apparently it connects through the
>>>> Virtualization
>>>> interface because it returns the VM name not the Hostname of the
>>>> client
>>>> VM. I do not know if this is how it is supposed to work?
>>>
>>> Yes, fence_xvm knows only about the VM names.
>>>
>>> To get pacemaker to be able to use it for fencing the cluster nodes,
>>> you have to add a pcmk_host_map parameter to the fencing resource.
>It
>>> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>>
>>>> In the local network, every traffic is allowed. No firewall is
>>>> locally
>>>> active, just the connections leaving the local network are
>>>> firewalled.
>>>> Hence there are no coneection problems between the hosts and
>clients.
>>>> For example we can succesfully connect from the clients to the
>Hosts:
>>>>
>>>> # nc -z -v -u 192.168.1.21 1229
>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>> Ncat: Connected to 192.168.1.21:1229.
>>>> Ncat: UDP packet sent successfully
>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>
>>>> # nc -z -v -u 192.168.1.13 1229
>>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>>> Ncat: Connected to 192.168.1.13:1229.
>>>> Ncat: UDP packet sent successfully
>>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>>
>>>>
>>>> On the Ubuntu VMs we created and configured the the stonith
>resource
>>>> according to the  howto provided here:
>>>>
>https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>>
>>>>
>>>> The actual line we used:
>>>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
>>>> hostlist="Host4,host2"
>>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>>
>>>>
>>>> But as you can see in in the pcs status output, stonith is stopped
>>>> and
>>>> exits with an unkown error.
>>>>
>>>> Can somebody please advise on how to procced or what additionla
>>>> information is needed to solve this problem?
>>>> Any help would be greatly appreciated! Thank you in advance.
>>>>
>>>> Kind regards
>>>> Stefan Schmitz
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
>_______________________________________________
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list