[ClusterLabs] Still Beginner STONITH Problem

Mon Jul 6 05:24:08 EDT 2020

On 7/6/20 10:10 AM, stefan.schmitz at farmpartner-tec.com wrote:
> Hello,
>
> >> # fence_xvm -o list
> >> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
> >> on
>
> >This should show both VMs, so getting to that point will likely solve
> >your problem. fence_xvm relies on multicast, there could be some
> >obscure network configuration to get that working on the VMs.
You said you tried on both hosts. What does 'virsh list'
give you onthe 2 hosts? Hopefully different names for
the VMs ...
Did you try 'fence_xvm -a {mcast-ip} -o list' on the
guests as well?
Did you try pinging via the physical network that is
connected tothe bridge configured to be used for
fencing?
If I got it right fence_xvm should supportcollecting
answersfrom multiple hosts but I found a suggestion
to do a setup with 2 multicast-addresses & keys for
each host.
Which route did you go?

Klaus
>
> Thank you for pointing me in that direction. We have tried to solve
> that but with no success. We were using an howto provided here
> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>
> Problem is, it specifically states that the tutorial does not yet
> support the case where guests are running on multiple hosts. There are
> some short hints what might be necessary to do, but working through
> those sadly just did not work nor where there any clues which would
> help us finding a solution ourselves. So now we are completely stuck
> here.
>
> Has someone the same configuration with Guest VMs on multiple hosts?
> And how did you manage to get that to work? What do we need to do to
> resolve this? Is there maybe even someone who would be willing to take
> a closer look at our server? Any help would be greatly appreciated!
>
> Kind regards
> Stefan Schmitz
>
>
>
> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>> On Thu, 2020-07-02 at 17:18 +0200, stefan.schmitz at farmpartner-tec.com
>> wrote:
>>> Hello,
>>>
>>> I hope someone can help with this problem. We are (still) trying to
>>> get
>>> Stonith to achieve a running active/active HA Cluster, but sadly to
>>> no
>>> avail.
>>>
>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
>>> The
>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>
>>> The current status is this:
>>>
>>> # pcs status
>>> Cluster name: pacemaker_cluster
>>> WARNING: corosync and pacemaker node names do not match (IPs used in
>>> setup?)
>>> Stack: corosync
>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>>> with
>>> quorum
>>> Last updated: Thu Jul  2 17:03:53 2020
>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>> server4ubuntu1
>>>
>>> 2 nodes configured
>>> 13 resources configured
>>>
>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>
>>> Full list of resources:
>>>
>>>    stonith_id_1   (stonith:external/libvirt):     Stopped
>>>    Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>        Masters: [ server4ubuntu1 ]
>>>        Slaves: [ server2ubuntu1 ]
>>>    Master/Slave Set: WebDataClone [WebData]
>>>        Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>    Clone Set: dlm-clone [dlm]
>>>        Started: [ server2ubuntu1 server4ubuntu1 ]
>>>    Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>        ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started
>>> server2ubuntu1
>>>        ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started
>>> server4ubuntu1
>>>    Clone Set: WebFS-clone [WebFS]
>>>        Started: [ server4ubuntu1 ]
>>>        Stopped: [ server2ubuntu1 ]
>>>    Clone Set: WebSite-clone [WebSite]
>>>        Started: [ server4ubuntu1 ]
>>>        Stopped: [ server2ubuntu1 ]
>>>
>>> Failed Actions:
>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>> call=201,
>>> status=Error, exitreason='',
>>>       last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>> exec=3403ms
>>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>>> call=203,
>>> status=complete, exitreason='',
>>>       last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>> call=202,
>>> status=Error, exitreason='',
>>>       last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>> exec=3411ms
>>>
>>>
>>> The stonith resoursce is stopped and does not seem to work.
>>> On both hosts the command
>>> # fence_xvm -o list
>>> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>> on
>>
>> This should show both VMs, so getting to that point will likely solve
>> your problem. fence_xvm relies on multicast, there could be some
>> obscure network configuration to get that working on the VMs.
>>
>>> returns the local VM. Apparently it connects through the
>>> Virtualization
>>> interface because it returns the VM name not the Hostname of the
>>> client
>>> VM. I do not know if this is how it is supposed to work?
>>
>> Yes, fence_xvm knows only about the VM names.
>>
>> To get pacemaker to be able to use it for fencing the cluster nodes,
>> you have to add a pcmk_host_map parameter to the fencing resource. It
>> looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."
>>
>>> In the local network, every traffic is allowed. No firewall is
>>> locally
>>> active, just the connections leaving the local network are
>>> firewalled.
>>> Hence there are no coneection problems between the hosts and clients.
>>> For example we can succesfully connect from the clients to the Hosts:
>>>
>>> # nc -z -v -u 192.168.1.21 1229
>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>> Ncat: Connected to 192.168.1.21:1229.
>>> Ncat: UDP packet sent successfully
>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>
>>> # nc -z -v -u 192.168.1.13 1229
>>> Ncat: Version 7.50 ( https://nmap.org/ncat )
>>> Ncat: Connected to 192.168.1.13:1229.
>>> Ncat: UDP packet sent successfully
>>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>>
>>>
>>> On the Ubuntu VMs we created and configured the the stonith resource
>>> according to the  howto provided here:
>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>>
>>>
>>> The actual line we used:
>>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
>>> hostlist="Host4,host2"
>>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>>
>>>
>>> But as you can see in in the pcs status output, stonith is stopped
>>> and
>>> exits with an unkown error.
>>>
>>> Can somebody please advise on how to procced or what additionla
>>> information is needed to solve this problem?
>>> Any help would be greatly appreciated! Thank you in advance.
>>>
>>> Kind regards
>>> Stefan Schmitz
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>