[ClusterLabs] Still Beginner STONITH Problem

stefan.schmitz at farmpartner-tec.com stefan.schmitz at farmpartner-tec.com
Fri Jul 17 09:56:44 EDT 2020


Hello,

I have now managed to get # fence_xvm -a 225.0.0.12 -o list to list at 
least its local Guest again. It seems the fence_virtd was not working 
properly anymore.

Regarding the Network XML config

# cat default.xml
  <network>
      <name>default</name>
      <bridge name="virbr0"/>
      <forward/>
      <ip address="192.168.122.1" netmask="255.255.255.0">
        <dhcp>
          <range start="192.168.122.2" end="192.168.122.254"/>
        </dhcp>
      </ip>
  </network>

I have used "virsh net-edit default" to test other network Devices on 
the hosts but this did not change anything.

Regarding the statement

 > If it is created by libvirt - this is NAT and you will never
 > receive  output  from the other  host.

I am at a loss an do not know why this is NAT. I am aware what NAT 
means, but what am I supposed to reconfigure here to dolve the problem?
Any help would be greatly appreciated.
Thank you in advance.

Kind regards
Stefan Schmitz


Am 15.07.2020 um 16:48 schrieb stefan.schmitz at farmpartner-tec.com:
> 
> Am 15.07.2020 um 16:29 schrieb Klaus Wenninger:
>> On 7/15/20 4:21 PM, stefan.schmitz at farmpartner-tec.com wrote:
>>> Hello,
>>>
>>>
>>> Am 15.07.2020 um 15:30 schrieb Klaus Wenninger:
>>>> On 7/15/20 3:15 PM, Strahil Nikolov wrote:
>>>>> If it is created by libvirt - this is NAT and you will never
>>>>> receive  output  from the other  host.
>>>> And twice the same subnet behind NAT is probably giving
>>>> issues at other places as well.
>>>> And if using DHCP you have to at least enforce that both sides
>>>> don't go for the same IP at least.
>>>> But all no explanation why it doesn't work on the same host.
>>>> Which is why I was asking for running the service on the
>>>> bridge to check if that would work at least. So that we
>>>> can go forward step by step.
>>>
>>> I just now finished trying and testing it on both hosts.
>>> I ran # fence_virtd -c on both hosts and entered different network
>>> devices. On both I tried br0 and the kvm10x.0.
>> According to your libvirt-config I would have expected
>> the bridge to be virbr0.
> 
> I understand that, but an "virbr0" Device does not seem to exist on any 
> of the two hosts.
> 
> # ip link show
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT group default qlen 1000
>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq 
> master bond0 state UP mode DEFAULT group default qlen 1000
>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
> 3: enp216s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode 
> DEFAULT group default qlen 1000
>      link/ether ac:1f:6b:26:69:dc brd ff:ff:ff:ff:ff:ff
> 4: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq 
> master bond0 state UP mode DEFAULT group default qlen 1000
>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
> 5: enp216s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode 
> DEFAULT group default qlen 1000
>      link/ether ac:1f:6b:26:69:dd brd ff:ff:ff:ff:ff:ff
> 6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue master br0 state UP mode DEFAULT group default qlen 1000
>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
> 7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
> UP mode DEFAULT group default qlen 1000
>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
> 8: kvm101.0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
> master br0 state UNKNOWN mode DEFAULT group default qlen 1000
>      link/ether fe:16:3c:ba:10:6c brd ff:ff:ff:ff:ff:ff
> 
> 
> 
>>>
>>> After each reconfiguration I ran #fence_xvm -a 225.0.0.12 -o list
>>> On the second server it worked with each device. After that I
>>> reconfigured back to the normal device, bond0, on which it did not
>>> work anymore, it worked now again!
>>> #  fence_xvm -a 225.0.0.12 -o list
>>> kvm102                           bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on
>>>
>>> But anyhow not on the first server, it did not work with any device.
>>> #  fence_xvm -a 225.0.0.12 -o list always resulted in
>>> Timed out waiting for response
>>> Operation failed
>>>
>>>
>>>
>>> Am 15.07.2020 um 15:15 schrieb Strahil Nikolov:
>>>> If it is created by libvirt - this is NAT and you will never receive
>>> output  from the other  host.
>>>>
>>> To my knowledge this is configured by libvirt. At least I am not aware
>>> having changend or configured it in any way. Up until today I did not
>>> even know that file existed. Could you please advise on what I need to
>>> do to fix this issue?
>>>
>>> Kind regards
>>>
>>>
>>>
>>>
>>>> Is pacemaker/corosync/knet btw. using the same interfaces/IPs?
>>>>
>>>> Klaus
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>
>>>>> На 15 юли 2020 г. 15:05:48 GMT+03:00,
>>>>> "stefan.schmitz at farmpartner-tec.com"
>>>>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>>>> Hello,
>>>>>>
>>>>>> Am 15.07.2020 um 13:42 Strahil Nikolov wrote:
>>>>>>> By default libvirt is using NAT and not routed network - in such
>>>>>> case, vm1 won't receive data from host2.
>>>>>>> Can you provide the Networks' xml ?
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Strahil Nikolov
>>>>>>>
>>>>>> # cat default.xml
>>>>>> <network>
>>>>>>     <name>default</name>
>>>>>>     <bridge name="virbr0"/>
>>>>>>     <forward/>
>>>>>>     <ip address="192.168.122.1" netmask="255.255.255.0">
>>>>>>       <dhcp>
>>>>>>         <range start="192.168.122.2" end="192.168.122.254"/>
>>>>>>       </dhcp>
>>>>>>     </ip>
>>>>>> </network>
>>>>>>
>>>>>> I just checked this and the file is identical on both hosts.
>>>>>>
>>>>>> kind regards
>>>>>> Stefan Schmitz
>>>>>>
>>>>>>
>>>>>>> На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger
>>>>>> <kwenning at redhat.com> написа:
>>>>>>>> On 7/15/20 11:42 AM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 15.07.2020 um 06:32 Strahil Nikolov wrote:
>>>>>>>>>> How  did you configure the network on your ubuntu 20.04 Hosts ? I
>>>>>>>>>> tried  to setup bridged connection for the test setup , but
>>>>>>>> obviously
>>>>>>>>>> I'm missing something.
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Strahil Nikolov
>>>>>>>>>>
>>>>>>>>> on the hosts (CentOS) the bridge config looks like that.The
>>>>>> bridging
>>>>>>>>> and configuration is handled by the virtualization software:
>>>>>>>>>
>>>>>>>>> # cat ifcfg-br0
>>>>>>>>> DEVICE=br0
>>>>>>>>> TYPE=Bridge
>>>>>>>>> BOOTPROTO=static
>>>>>>>>> ONBOOT=yes
>>>>>>>>> IPADDR=192.168.1.21
>>>>>>>>> NETMASK=255.255.0.0
>>>>>>>>> GATEWAY=192.168.1.1
>>>>>>>>> NM_CONTROLLED=no
>>>>>>>>> IPV6_AUTOCONF=yes
>>>>>>>>> IPV6_DEFROUTE=yes
>>>>>>>>> IPV6_PEERDNS=yes
>>>>>>>>> IPV6_PEERROUTES=yes
>>>>>>>>> IPV6_FAILURE_FATAL=no
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 15.07.2020 um 09:50 Klaus Wenninger wrote:
>>>>>>>>>> Guess it is not easy to have your servers connected physically 
>>>>>>>>>> for
>>>>>>>> a
>>>>>>>>> try.
>>>>>>>>>> But maybe you can at least try on one host to have virt_fenced &
>>>>>> VM
>>>>>>>>>> on the same bridge - just to see if that basic pattern is 
>>>>>>>>>> working.
>>>>>>>>> I am not sure if I understand you correctly. What do you by having
>>>>>>>>> them on the same bridge? The bridge device is configured on the
>>>>>> host
>>>>>>>>> by the virtualization software.
>>>>>>>> I meant to check out which bridge the interface of the VM is
>>>>>> enslaved
>>>>>>>> to and to use that bridge as interface in /etc/fence_virt.conf.
>>>>>>>> Get me right - just for now - just to see if it is working for this
>>>>>> one
>>>>>>>> host and the corresponding guest.
>>>>>>>>>
>>>>>>>>>> Well maybe still sbdy in the middle playing IGMPv3 or the request
>>>>>>>> for
>>>>>>>>>> a certain source is needed to shoot open some firewall or
>>>>>>>> switch-tables.
>>>>>>>>> I am still waiting for the final report from our Data Center 
>>>>>>>>> techs.
>>>>>> I
>>>>>>>>> hope that will clear up somethings.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Additionally  I have just noticed that apparently since switching
>>>>>>>> from
>>>>>>>>> IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o
>>>>>>>>> list" is no completely broken.
>>>>>>>>> Before that switch this command at least returned the local VM. 
>>>>>>>>> Now
>>>>>>>> it
>>>>>>>>> returns:
>>>>>>>>> Timed out waiting for response
>>>>>>>>> Operation failed
>>>>>>>>>
>>>>>>>>> I am a bit confused by that, because all we did was running
>>>>>> commands
>>>>>>>>> like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the
>>>>>>>>> different Version umbers and #cat /proc/net/igmp shows that V3 is
>>>>>>>> used
>>>>>>>>> again on every device just like before...?!
>>>>>>>>>
>>>>>>>>> kind regards
>>>>>>>>> Stefan Schmitz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> На 14 юли 2020 г. 11:06:42 GMT+03:00,
>>>>>>>>>> "stefan.schmitz at farmpartner-tec.com"
>>>>>>>>>> <stefan.schmitz at farmpartner-tec.com> написа:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 09.07.2020 um 19:10 Strahil Nikolov wrote:
>>>>>>>>>>>> Have  you  run 'fence_virtd  -c' ?
>>>>>>>>>>> Yes I had run that on both Hosts. The current config looks like
>>>>>>>> that
>>>>>>>>>>> and
>>>>>>>>>>> is identical on both.
>>>>>>>>>>>
>>>>>>>>>>> cat fence_virt.conf
>>>>>>>>>>> fence_virtd {
>>>>>>>>>>>             listener = "multicast";
>>>>>>>>>>>             backend = "libvirt";
>>>>>>>>>>>             module_path = "/usr/lib64/fence-virt";
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> listeners {
>>>>>>>>>>>             multicast {
>>>>>>>>>>>                     key_file = "/etc/cluster/fence_xvm.key";
>>>>>>>>>>>                     address = "225.0.0.12";
>>>>>>>>>>>                     interface = "bond0";
>>>>>>>>>>>                     family = "ipv4";
>>>>>>>>>>>                     port = "1229";
>>>>>>>>>>>             }
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> backends {
>>>>>>>>>>>             libvirt {
>>>>>>>>>>>                     uri = "qemu:///system";
>>>>>>>>>>>             }
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The situation is still that no matter on what host I issue the
>>>>>>>>>>> "fence_xvm -a 225.0.0.12 -o list" command, both guest systems
>>>>>>>> receive
>>>>>>>>>>> the traffic. The local guest, but also the guest on the other
>>>>>> host.
>>>>>>>> I
>>>>>>>>>>> reckon that means the traffic is not filtered by any network
>>>>>>>> device,
>>>>>>>>>>> like switches or firewalls. Since the guest on the other host
>>>>>>>> receives
>>>>>>>>>>> the packages, the traffic must reach te physical server and
>>>>>>>>>>> networkdevice and is then routed to the VM on that host.
>>>>>>>>>>> But still, the traffic is not shown on the host itself.
>>>>>>>>>>>
>>>>>>>>>>> Further the local firewalls on both hosts are set to let each 
>>>>>>>>>>> and
>>>>>>>> every
>>>>>>>>>>> traffic pass. Accept to any and everything. Well at least as far
>>>>>> as
>>>>>>>> I
>>>>>>>>>>> can see.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 09.07.2020 um 22:34 Klaus Wenninger wrote:
>>>>>>>>>>>> makes me believe that
>>>>>>>>>>>> the whole setup doesn't lookas I would have
>>>>>>>>>>>> expected (bridges on each host where theguest
>>>>>>>>>>>> has a connection to and where ethernet interfaces
>>>>>>>>>>>> that connect the 2 hosts are part of as well
>>>>>>>>>>> On each physical server the networkcards are bonded to achieve
>>>>>>>> failure
>>>>>>>>>>> safety (bond0). The guest are connected over a bridge(br0) but
>>>>>>>>>>> apparently our virtualization softrware creates an own device
>>>>>> named
>>>>>>>>>>> after the guest (kvm101.0).
>>>>>>>>>>> There is no direct connection between the servers, but as I said
>>>>>>>>>>> earlier, the multicast traffic does reach the VMs so I assume
>>>>>> there
>>>>>>>> is
>>>>>>>>>>> no problem with that.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote:
>>>>>>>>>>>> First, you need to ensure that your switch (or all switches in
>>>>>> the
>>>>>>>>>>>> path) have igmp snooping enabled on host ports (and probably
>>>>>>>>>>>> interconnects along the path between your hosts).
>>>>>>>>>>>>
>>>>>>>>>>>> Second, you need an igmp querier to be enabled somewhere near
>>>>>>>> (better
>>>>>>>>>>>> to have it enabled on a switch itself). Please verify that you
>>>>>> see
>>>>>>>>>>> its
>>>>>>>>>>>> queries on hosts.
>>>>>>>>>>>>
>>>>>>>>>>>> Next, you probably need to make your hosts to use IGMPv2 
>>>>>>>>>>>> (not 3)
>>>>>>>> as
>>>>>>>>>>>> many switches still can not understand v3. This is doable by
>>>>>>>> sysctl,
>>>>>>>>>>>> find on internet, there are many articles.
>>>>>>>>>>>
>>>>>>>>>>> I have send an query to our Data center Techs who are analyzing
>>>>>>>> this
>>>>>>>>>>> and
>>>>>>>>>>> were already on it analyzing if multicast Traffic is somewhere
>>>>>>>> blocked
>>>>>>>>>>> or hindered. So far the answer is, "multicast ist explictly
>>>>>> allowed
>>>>>>>> in
>>>>>>>>>>> the local network and no packets are filtered or dropped". I am
>>>>>>>> still
>>>>>>>>>>> waiting for a final report though.
>>>>>>>>>>>
>>>>>>>>>>> In the meantime I have switched IGMPv3 to IGMPv2 on every
>>>>>> involved
>>>>>>>>>>> server, hosts and guests via the mentioned sysctl. The switching
>>>>>>>> itself
>>>>>>>>>>> was successful, according to "cat /proc/net/igmp" but sadly did
>>>>>> not
>>>>>>>>>>> better the behavior. It actually led to that no VM received the
>>>>>>>>>>> multicast traffic anymore too.
>>>>>>>>>>>
>>>>>>>>>>> kind regards
>>>>>>>>>>> Stefan Schmitz
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Am 09.07.2020 um 22:34 schrieb Klaus Wenninger:
>>>>>>>>>>>> On 7/9/20 5:17 PM, stefan.schmitz at farmpartner-tec.com wrote:
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, theory still holds I would say.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I guess that the multicast-traffic from the other host
>>>>>>>>>>>>>> or the guestsdoesn't get to the daemon on the host.
>>>>>>>>>>>>>> Can't you just simply check if there are any firewall
>>>>>>>>>>>>>> rules configuredon the host kernel?
>>>>>>>>>>>>> I hope I did understand you corretcly and you are referring to
>>>>>>>>>>> iptables?
>>>>>>>>>>>> I didn't say iptables because it might have been
>>>>>>>>>>>> nftables - but yesthat is what I was referring to.
>>>>>>>>>>>> Guess to understand the config the output is
>>>>>>>>>>>> lacking verbositybut it makes me believe that
>>>>>>>>>>>> the whole setup doesn't lookas I would have
>>>>>>>>>>>> expected (bridges on each host where theguest
>>>>>>>>>>>> has a connection to and where ethernet interfaces
>>>>>>>>>>>> that connect the 2 hosts are part of as well -
>>>>>>>>>>>> everythingconnected via layer 2 basically).
>>>>>>>>>>>>> Here is the output of the current rules. Besides the IP of the
>>>>>>>> guest
>>>>>>>>>>>>> the output is identical on both hosts:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # iptables -S
>>>>>>>>>>>>> -P INPUT ACCEPT
>>>>>>>>>>>>> -P FORWARD ACCEPT
>>>>>>>>>>>>> -P OUTPUT ACCEPT
>>>>>>>>>>>>>
>>>>>>>>>>>>> # iptables -L
>>>>>>>>>>>>> Chain INPUT (policy ACCEPT)
>>>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chain FORWARD (policy ACCEPT)
>>>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>>> SOLUSVM_TRAFFIC_IN  all  --  anywhere             anywhere
>>>>>>>>>>>>> SOLUSVM_TRAFFIC_OUT  all  --  anywhere             anywhere
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chain OUTPUT (policy ACCEPT)
>>>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chain SOLUSVM_TRAFFIC_IN (1 references)
>>>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>>>                 all  --  anywhere             192.168.1.14
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chain SOLUSVM_TRAFFIC_OUT (1 references)
>>>>>>>>>>>>> target     prot opt source               destination
>>>>>>>>>>>>>                 all  --  192.168.1.14         anywhere
>>>>>>>>>>>>>
>>>>>>>>>>>>> kind regards
>>>>>>>>>>>>> Stefan Schmitz
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>
>>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list