[ClusterLabs] Still Beginner STONITH Problem

Thu Jul 9 16:12:51 EDT 2020

On 7/9/20 8:18 PM, Vladislav Bogdanov wrote:
> Hi.
>
> This thread is getting too long.
>
> First, you need to ensure that your switch (or all switches in the
> path) have igmp snooping enabled on host ports (and probably
> interconnects along the path between your hosts).
>
> Second, you need an igmp querier to be enabled somewhere near (better
> to have it enabled on a switch itself). Please verify that you see its
> queries on hosts.
>
> Next, you probably need to make your hosts to use IGMPv2 (not 3) as
> many switches still can not understand v3. This is doable by sysctl,
> find on internet, there are many articles.
Switch configuration might be in the way as well but as
the problem exists in the communication between a
host and the guest running on that host I would rather
bet for firewall rules on the host(s).
>
> These advices are also applicable for running corosync itself in
> multicast mode.
>
> Best,
> Vladislav
>
> Thu, 02/07/2020 в 17:18 +0200, stefan.schmitz at farmpartner-tec.com
> wrote:
>> Hello,
>>
>> I hope someone can help with this problem. We are (still) trying to
>> get 
>> Stonith to achieve a running active/active HA Cluster, but sadly to
>> no 
>> avail.
>>
>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
>> The 
>> Ubuntu VMs are the ones which should form the HA Cluster.
>>
>> The current status is this:
>>
>> # pcs status
>> Cluster name: pacemaker_cluster
>> WARNING: corosync and pacemaker node names do not match (IPs used in
>> setup?)
>> Stack: corosync
>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>> with 
>> quorum
>> Last updated: Thu Jul  2 17:03:53 2020
>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>> server4ubuntu1
>>
>> 2 nodes configured
>> 13 resources configured
>>
>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>
>> Full list of resources:
>>
>>   stonith_id_1   (stonith:external/libvirt):     Stopped
>>   Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>       Masters: [ server4ubuntu1 ]
>>       Slaves: [ server2ubuntu1 ]
>>   Master/Slave Set: WebDataClone [WebData]
>>       Masters: [ server2ubuntu1 server4ubuntu1 ]
>>   Clone Set: dlm-clone [dlm]
>>       Started: [ server2ubuntu1 server4ubuntu1 ]
>>   Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>       ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started 
>> server2ubuntu1
>>       ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started 
>> server4ubuntu1
>>   Clone Set: WebFS-clone [WebFS]
>>       Started: [ server4ubuntu1 ]
>>       Stopped: [ server2ubuntu1 ]
>>   Clone Set: WebSite-clone [WebSite]
>>       Started: [ server4ubuntu1 ]
>>       Stopped: [ server2ubuntu1 ]
>>
>> Failed Actions:
>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>> call=201, 
>> status=Error, exitreason='',
>>      last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>> exec=3403ms
>> * r0_pacemaker_monitor_60000 on server2ubuntu1 'master' (8):
>> call=203, 
>> status=complete, exitreason='',
>>      last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>> call=202, 
>> status=Error, exitreason='',
>>      last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>> exec=3411ms
>>
>>
>> The stonith resoursce is stopped and does not seem to work.
>> On both hosts the command
>> # fence_xvm -o list
>> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 
>> on
>>
>> returns the local VM. Apparently it connects through the
>> Virtualization 
>> interface because it returns the VM name not the Hostname of the
>> client 
>> VM. I do not know if this is how it is supposed to work?
>>
>> In the local network, every traffic is allowed. No firewall is
>> locally 
>> active, just the connections leaving the local network are
>> firewalled.
>> Hence there are no coneection problems between the hosts and clients.
>> For example we can succesfully connect from the clients to the Hosts:
>>
>> # nc -z -v -u 192.168.1.21 1229
>> Ncat: Version 7.50 ( 
>> https://nmap.org/ncat
>>  )
>> Ncat: Connected to 192.168.1.21:1229.
>> Ncat: UDP packet sent successfully
>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>
>> # nc -z -v -u 192.168.1.13 1229
>> Ncat: Version 7.50 ( 
>> https://nmap.org/ncat
>>  )
>> Ncat: Connected to 192.168.1.13:1229.
>> Ncat: UDP packet sent successfully
>> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
>>
>>
>> On the Ubuntu VMs we created and configured the the stonith resource 
>> according to the  howto provided here:
>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
>>
>>
>> The actual line we used:
>> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt 
>> hostlist="Host4,host2"
>> hypervisor_uri="qemu+ssh://192.168.1.21/system"
>>
>>
>> But as you can see in in the pcs status output, stonith is stopped
>> and 
>> exits with an unkown error.
>>
>> Can somebody please advise on how to procced or what additionla 
>> information is needed to solve this problem?
>> Any help would be greatly appreciated! Thank you in advance.
>>
>> Kind regards
>> Stefan Schmitz
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/