[ClusterLabs] Still Beginner STONITH Problem

Tue Jul 21 02:32:57 EDT 2020

On 7/20/20 5:05 PM, Stefan Schmitz wrote:
> Hello,
>
> I have now deleted the previous stonith resource and added two new
> ones, one for each server. The commands I used for that:
>
> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
> hostlist="host2" pcmk_host_list="server2ubuntu1,server4ubuntu1"
> hypervisor_uri="qemu+ssh://192.168.1.13/system"
>
> # pcs -f stonith_cfg stonith create stonith_id_2 external/libvirt
> hostlist="Host4" pcmk_host_list="server2ubuntu1,server4ubuntu1"
> hypervisor_uri="qemu+ssh://192.168.1.21/system"
As already mentioned external/libvirt is the wrong fence-agent.
You have to use fence_xvm as you've tried on the cmdline.
You don't need hostlist and probably no pcmk_host_list as well
as fence_xvm is gonna query fence_virtd for possible targets.
If you don't have a 1:1 match between host names in libvirt
and node-names you will definitely need a
pcmk_host_map="{pacemaker-node1}:{guest-name1};..."
And you have to give the attribute multicast_address=... .
The reference to libvirt/hypervisor is solely in fence_virtd.

If you intended to switch over to a solution without
fence_virtd-service and a fence-agent that is directly talking
to the hypervisor forget my comments - haven't done that with
libvirt so far.

Klaus
>
>
> The behaviour is now somewhat different but it still does work. I
> guess I am doing something completely wrong in in setting up the
> stonith resource?
>
> The pcs status command shows two running stonith resources on one
> server but two stopped ones on the other. Additionally there is an
> failed fencing action. The one server showing running stonith is
> marked as unclean and fencing wants to reboot it but fails doing so.
>
> Any advise on how to proceed would be greatly appreaciated.
>
>
> The pcs shortened status outputs of each of the VMs:
>
>
> # pcs status of server2ubuntu1
> [...]
> Node List:
>   * Node server2ubuntu1: UNCLEAN (online)
>   * Online: [ server4ubuntu1 ]
>
> Full List of Resources:
>   [...]
>   * stonith_id_1        (stonith:external/libvirt):      Started
> server2ubuntu1
>   * stonith_id_2        (stonith:external/libvirt):      Started
> server2ubuntu1
>
> Failed Resource Actions:
>   * stonith_id_1_start_0 on server4ubuntu1 'error' (1): call=228,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:35:45
> +01:00', queued=4391ms, exec=2890ms
>   * stonith_id_2_start_0 on server4ubuntu1 'error' (1): call=229,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:35:45
> +01:00', queued=2230ms, exec=-441815ms
>   * r0_pacemaker_stop_0 on server2ubuntu1 'error' (1): call=198,
> status='Timed Out', exitreason='', last-rc-change='1970-01-08 01:33:54
> +01:00', queued=321ms, exec=115529ms
>   * stonith_id_1_start_0 on server2ubuntu1 'error' (1): call=196,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:33:54
> +01:00', queued=161ms, exec=-443582ms
>   * stonith_id_2_start_0 on server2ubuntu1 'error' (1): call=197,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:33:54
> +01:00', queued=69ms, exec=-444042ms
>
> Failed Fencing Actions:
>   * reboot of server2ubuntu1 failed: delegate=,
> client=pacemaker-controld.2002, origin=server4ubuntu1,
> last-failed='2020-07-20 16:51:49 +02:00'
>
>
>
> # pcs status of server4ubuntu1
> [...]
> Node List:
>   * Node server2ubuntu1: UNCLEAN (online)
>   * Online: [ server4ubuntu1 ]
>
> Full List of Resources:
> [...]
>   * stonith_id_1        (stonith:external/libvirt):      FAILED
> server4ubuntu1
>   * stonith_id_2        (stonith:external/libvirt):      FAILED
> server4ubuntu1
>
> Failed Resource Actions:
>   * stonith_id_1_start_0 on server4ubuntu1 'error' (1): call=248,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:45:07
> +01:00', queued=350ms, exec=516901ms
>   * stonith_id_2_start_0 on server4ubuntu1 'error' (1): call=249,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:45:07
> +01:00', queued=149ms, exec=515438ms
>   * stonith_id_1_start_0 on server2ubuntu1 'error' (1): call=215,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:44:53
> +01:00', queued=189ms, exec=534334ms
>   * stonith_id_2_start_0 on server2ubuntu1 'error' (1): call=216,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:44:53
> +01:00', queued=82ms, exec=564228ms
>
> Failed Fencing Actions:
>   * reboot of server2ubuntu1 failed: delegate=,
> client=pacemaker-controld.2002, origin=server4ubuntu1,
> last-failed='2020-07-20 16:51:49 +02:00'
>
>
>
> kind regards
> Stefan Schmitz
>
>
>
>
> Am 20.07.2020 um 13:51 schrieb Stefan Schmitz:
>>
>>
>>
>> Am 20.07.2020 um 13:36 schrieb Klaus Wenninger:
>>> On 7/20/20 1:10 PM, Stefan Schmitz wrote:
>>>> Hello,
>>>>
>>>> thank you all very much for your help so far!
>>>>
>>>> We have no managed to capture the mulitcast traffic originating from
>>>> one host when issuing the command "fence_xvm -o list" on the other
>>>> host. Now the tcpdump at least looks exactly the same on all 4
>>>> servers, hosts and guest. I can not tell how and why this just started
>>>> working, but I got our Datacenter Techs final report this morning,
>>>> that there are no problems present.
>>>>
>>>>
>>>>
>>>> Am 19.07.2020 um 09:32 schrieb Andrei Borzenkov:
>>>>> external/libvirt is unrelated to fence_xvm
>>>>
>>>> Could you please explain that a bit more? Do you mean that the current
>>>> problem of the dysfunctional Stonith/fencing is unrelated to libvirt?
>>> Hadn't spotted that ... sry
>>> What he meant is if you are using fence_virtd-service on
>>> the host(s) then the matching fencing-resource is based
>>> on fence_xvm and not external/libvirt.
>>> The libvirt-stuff is handled by the daemon running on your host.
>>>>
>>>>> fence_xvm opens TCP listening socket, sends request and waits for
>>>>> connection to this socket (from fence_virtd) which is used to submit
>>>>> actual fencing operation. Only the first connection request is
>>>>> handled.
>>>>> So first host that responds will be processed. Local host is likely
>>>>> always faster to respond than remote host.
>>>>
>>>> Thank you for the explanation, I get that. But what would you suggest
>>>> to remedy this situation? We have been using libvirt and fence_xvm
>>>> because of the clusterlabs wiki articles and the suggestions in this
>>>> mailing list. Is there anything you suggest we need to change to make
>>>> this Cluster finally work?
>>> Guess what he meant, what I've already suggested before
>>> and what is as well described in the
>>> article linked is having totally separate configurations for
>>> each host. If you are using different multicast-addresses
>>> or unicast - as Andrei is suggesting and which I haven't used
>>> before - probably doesn't matter. (Unless of course something
>>> is really blocking multicast ...)
>>> And you have to setup one fencing-resource per host
>>> (fence_xvm) that has the address configured you've setup
>>> on each of the hosts.
>>
>> Thank you for thte explanation. I sadly cannot access the articles. I
>> take, totally separate configurations means having a stonith resource
>> configured in the cluster for each host. So for now I will delete the
>> current resource and try to configure two new ones.
>>
>>>>
>>>>
>>>> Am 18.07.2020 um 02:36 schrieb Reid Wahl:
>>>>> However, when users want to configure fence_xvm for multiple hosts
>>>> with the libvirt backend, I have typically seen them configure
>>>> multiple fence_xvm devices (one per host) and configure a different
>>>> multicast address on each host.
>>>>
>>>> I do have an Red Hat Account but not a payed subscription, which sadly
>>>> is needed to access the articles you have linked.
>>>>
>>>> We have installed fence_virt on both hosts since the beginning, if
>>>> that is what you mean by " multiple fence_xvm devices (one per host)".
>>>> They were however both configured to use the same multicast IP Adress,
>>>> which we now changed so that each hosts fence_xvm install uses a
>>>> different multicast IP. Sadly this does not seem to change anything in
>>>> the behaviour.
>>>> What is interesting though is, that i ran again fence_xvm -c changed
>>>> the multicast IP to 225.0.0.13 (from .12). I killed and restarted the
>>>> daemon multiple times after that.
>>>> When I now run #fence_xvm -o list without specifiying an IP adress
>>>> tcpdump on the other host still shows the old IP as the originating
>>>> one.
>>>> tcpdum on other host:
>>>>      Host4.54001 > 225.0.0.12.zented: [udp sum ok] UDP, length 176
>>>>      Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.12 to_in { }]
>>>>      Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.12 to_in { }]
>>>> Only when I specify the other IP it apparently really gets used:
>>>> #  fence_xvm -a 225.0.0.13 -o list
>>>> tcpdum on other host:
>>>> Host4.46011 > 225.0.0.13.zented: [udp sum ok] UDP, length 176
>>>>      Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.13 to_in { }]
>>>>      Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.13 to_in { }]
>>>>
>>>>
>>>>
>>>>
>>>> Am 17.07.2020 um 16:49 schrieb Strahil Nikolov:
>>>>> The simplest way to check if the libvirt's network is NAT (or not)
>>>> is to try to ssh from the first VM to the second one.
>>>> That does work without any issue. I can ssh to any server in our
>>>> network, host or guest, without a problem. Does that mean there is no
>>>> natting involved?
>>>>
>>>>
>>>>
>>>> Am 17.07.2020 um 16:41 schrieb Klaus Wenninger:
>>>>> How does your VM part of the network-config look like?
>>>> # cat ifcfg-br0
>>>> DEVICE=br0
>>>> TYPE=Bridge
>>>> BOOTPROTO=static
>>>> ONBOOT=yes
>>>> IPADDR=192.168.1.13
>>>> NETMASK=255.255.0.0
>>>> GATEWAY=192.168.1.1
>>>> NM_CONTROLLED=no
>>>> IPV6_AUTOCONF=yes
>>>> IPV6_DEFROUTE=yes
>>>> IPV6_PEERDNS=yes
>>>> IPV6_PEERROUTES=yes
>>>> IPV6_FAILURE_FATAL=no
>>>>
>>>>
>>>>>> I am at a loss an do not know why this is NAT. I am aware what NAT
>>>>>> means, but what am I supposed to reconfigure here to dolve the
>>>> problem?
>>>>> As long as you stay within the subnet you are running on your bridge
>>>>> you won't get natted but once it starts to route via the host the
>>>> libvirt
>>>>> default bridge will be natted.
>>>>> What you can do is connect the bridges on your 2 hosts via layer 2.
>>>>> Possible ways should be OpenVPN, knet, VLAN on your switches ...
>>>>> (and yes - a cable  )
>>>>> If your guests are using DHCP you should probably configure
>>>>> fixed IPs for those MACs.
>>>> All our server have fixed IPs, DHCP is not used anywhere in our
>>>> network for dynamic IPs assignment.
>>>> Regarding the "check if VMs are natted", is this solved by the ssh
>>>> test suggested by Strahil Nikolov? Can I assume natting is not a
>>>> problem here or do we still have to take measures?
>>>>
>>>>
>>>>
>>>> kind regards
>>>> Stefan Schmitz
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Am 18.07.2020 um 02:36 schrieb Reid Wahl:
>>>>> I'm not sure that the libvirt backend is intended to be used in this
>>>>> way, with multiple hosts using the same multicast address. From the
>>>>> fence_virt.conf man page:
>>>>>
>>>>> ~~~
>>>>> BACKENDS
>>>>>       libvirt
>>>>>           The  libvirt  plugin  is  the  simplest  plugin.  It is
>>>>> used in
>>>>> environments where routing fencing requests between multiple hosts is
>>>>> not required, for example by a user running a cluster of virtual
>>>>>           machines on a single desktop computer.
>>>>>       libvirt-qmf
>>>>>           The libvirt-qmf plugin acts as a QMFv2 Console to the
>>>>> libvirt-qmf daemon in order to route fencing requests over AMQP to
>>>>> the
>>>>> appropriate computer.
>>>>>       cpg
>>>>>           The cpg plugin uses corosync CPG and libvirt to track
>>>>> virtual
>>>>> machines and route fencing requests to the appropriate computer.
>>>>> ~~~
>>>>>
>>>>> I'm not an expert on fence_xvm or libvirt. It's possible that this
>>>>> is a
>>>>> viable configuration with the libvirt backend.
>>>>>
>>>>> However, when users want to configure fence_xvm for multiple hosts
>>>>> with
>>>>> the libvirt backend, I have typically seen them configure multiple
>>>>> fence_xvm devices (one per host) and configure a different multicast
>>>>> address on each host.
>>>>>
>>>>> If you have a Red Hat account, see also:
>>>>>      - https://access.redhat.com/solutions/2386421#comment-1209661
>>>>>      - https://access.redhat.com/solutions/2386421#comment-1209801
>>>>>
>>>>> On Fri, Jul 17, 2020 at 7:49 AM Strahil Nikolov
>>>>> <hunter86_bg at yahoo.com
>>>>> <mailto:hunter86_bg at yahoo.com>> wrote:
>>>>>
>>>>>       The simplest way to check if the libvirt's network is NAT
>>>>> (or not)
>>>>>       is to try to ssh from the first VM to the second one.
>>>>>
>>>>>       I should admit that I was  lost when I tried  to create a
>>>>> routed
>>>>>       network in KVM, so I can't help with that.
>>>>>
>>>>>       Best Regards,
>>>>>       Strahil Nikolov
>>>>>
>>>>>       На 17 юли 2020 г. 16:56:44 GMT+03:00,
>>>>>       "stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>"
>>>>>       <stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>> написа:
>>>>>        >Hello,
>>>>>        >
>>>>>        >I have now managed to get # fence_xvm -a 225.0.0.12 -o
>>>>> list to
>>>>>       list at
>>>>>        >least its local Guest again. It seems the fence_virtd was not
>>>> working
>>>>>        >properly anymore.
>>>>>        >
>>>>>        >Regarding the Network XML config
>>>>>        >
>>>>>        ># cat default.xml
>>>>>        >  <network>
>>>>>        >      <name>default</name>
>>>>>        >      <bridge name="virbr0"/>
>>>>>        >      <forward/>
>>>>>        >      <ip address="192.168.122.1" netmask="255.255.255.0">
>>>>>        >        <dhcp>
>>>>>        >          <range start="192.168.122.2"
>>>>> end="192.168.122.254"/>
>>>>>        >        </dhcp>
>>>>>        >      </ip>
>>>>>        >  </network>
>>>>>        >
>>>>>        >I have used "virsh net-edit default" to test other network
>>>> Devices on
>>>>>        >the hosts but this did not change anything.
>>>>>        >
>>>>>        >Regarding the statement
>>>>>        >
>>>>>        > > If it is created by libvirt - this is NAT and you will
>>>>> never
>>>>>        > > receive  output  from the other  host.
>>>>>        >
>>>>>        >I am at a loss an do not know why this is NAT. I am aware
>>>>> what
>>>> NAT
>>>>>        >means, but what am I supposed to reconfigure here to dolve
>>>>> the
>>>>>       problem?
>>>>>        >Any help would be greatly appreciated.
>>>>>        >Thank you in advance.
>>>>>        >
>>>>>        >Kind regards
>>>>>        >Stefan Schmitz
>>>>>        >
>>>>>        >
>>>>>        >Am 15.07.2020 um 16:48 schrieb
>>>>> stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>:
>>>>>        >>
>>>>>        >> Am 15.07.2020 um 16:29 schrieb Klaus Wenninger:
>>>>>        >>> On 7/15/20 4:21 PM, stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com> wrote:
>>>>>        >>>> Hello,
>>>>>        >>>>
>>>>>        >>>>
>>>>>        >>>> Am 15.07.2020 um 15:30 schrieb Klaus Wenninger:
>>>>>        >>>>> On 7/15/20 3:15 PM, Strahil Nikolov wrote:
>>>>>        >>>>>> If it is created by libvirt - this is NAT and you will
>>>> never
>>>>>        >>>>>> receive  output  from the other  host.
>>>>>        >>>>> And twice the same subnet behind NAT is probably giving
>>>>>        >>>>> issues at other places as well.
>>>>>        >>>>> And if using DHCP you have to at least enforce that both
>>>> sides
>>>>>        >>>>> don't go for the same IP at least.
>>>>>        >>>>> But all no explanation why it doesn't work on the
>>>>> same host.
>>>>>        >>>>> Which is why I was asking for running the service on the
>>>>>        >>>>> bridge to check if that would work at least. So that we
>>>>>        >>>>> can go forward step by step.
>>>>>        >>>>
>>>>>        >>>> I just now finished trying and testing it on both hosts.
>>>>>        >>>> I ran # fence_virtd -c on both hosts and entered
>>>>> different
>>>> network
>>>>>        >>>> devices. On both I tried br0 and the kvm10x.0.
>>>>>        >>> According to your libvirt-config I would have expected
>>>>>        >>> the bridge to be virbr0.
>>>>>        >>
>>>>>        >> I understand that, but an "virbr0" Device does not seem to
>>>> exist on
>>>>>        >any
>>>>>        >> of the two hosts.
>>>>>        >>
>>>>>        >> # ip link show
>>>>>        >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state
>>>> UNKNOWN
>>>>>        >mode
>>>>>        >> DEFAULT group default qlen 1000
>>>>>        >>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>>>        >> 2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500
>>>> qdisc mq
>>>>>        >> master bond0 state UP mode DEFAULT group default qlen 1000
>>>>>        >>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>>        >> 3: enp216s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
>>>> state DOWN
>>>>>        >mode
>>>>>        >> DEFAULT group default qlen 1000
>>>>>        >>      link/ether ac:1f:6b:26:69:dc brd ff:ff:ff:ff:ff:ff
>>>>>        >> 4: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500
>>>> qdisc mq
>>>>>        >> master bond0 state UP mode DEFAULT group default qlen 1000
>>>>>        >>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>>        >> 5: enp216s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
>>>> state DOWN
>>>>>        >mode
>>>>>        >> DEFAULT group default qlen 1000
>>>>>        >>      link/ether ac:1f:6b:26:69:dd brd ff:ff:ff:ff:ff:ff
>>>>>        >> 6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500
>>>> qdisc
>>>>>        >> noqueue master br0 state UP mode DEFAULT group default qlen
>>>> 1000
>>>>>        >>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>>        >> 7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>>>> noqueue
>>>>>        >state
>>>>>        >> UP mode DEFAULT group default qlen 1000
>>>>>        >>      link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>>        >> 8: kvm101.0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
>>>>> qdisc
>>>>>        >pfifo_fast
>>>>>        >> master br0 state UNKNOWN mode DEFAULT group default qlen
>>>>> 1000
>>>>>        >>      link/ether fe:16:3c:ba:10:6c brd ff:ff:ff:ff:ff:ff
>>>>>        >>
>>>>>        >>
>>>>>        >>
>>>>>        >>>>
>>>>>        >>>> After each reconfiguration I ran #fence_xvm -a 225.0.0.12
>>>> -o list
>>>>>        >>>> On the second server it worked with each device. After
>>>>> that I
>>>>>        >>>> reconfigured back to the normal device, bond0, on
>>>>> which it
>>>> did not
>>>>>        >>>> work anymore, it worked now again!
>>>>>        >>>> #  fence_xvm -a 225.0.0.12 -o list
>>>>>        >>>> kvm102
>>>>>        >bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on
>>>>>        >>>>
>>>>>        >>>> But anyhow not on the first server, it did not work
>>>>> with any
>>>>>        >device.
>>>>>        >>>> #  fence_xvm -a 225.0.0.12 -o list always resulted in
>>>>>        >>>> Timed out waiting for response
>>>>>        >>>> Operation failed
>>>>>        >>>>
>>>>>        >>>>
>>>>>        >>>>
>>>>>        >>>> Am 15.07.2020 um 15:15 schrieb Strahil Nikolov:
>>>>>        >>>>> If it is created by libvirt - this is NAT and you
>>>>> will never
>>>>>        >receive
>>>>>        >>>> output  from the other  host.
>>>>>        >>>>>
>>>>>        >>>> To my knowledge this is configured by libvirt. At least I
>>>> am not
>>>>>        >aware
>>>>>        >>>> having changend or configured it in any way. Up until
>>>> today I did
>>>>>        >not
>>>>>        >>>> even know that file existed. Could you please advise on
>>>> what I
>>>>>       need
>>>>>        >to
>>>>>        >>>> do to fix this issue?
>>>>>        >>>>
>>>>>        >>>> Kind regards
>>>>>        >>>>
>>>>>        >>>>
>>>>>        >>>>
>>>>>        >>>>
>>>>>        >>>>> Is pacemaker/corosync/knet btw. using the same
>>>> interfaces/IPs?
>>>>>        >>>>>
>>>>>        >>>>> Klaus
>>>>>        >>>>>>
>>>>>        >>>>>> Best Regards,
>>>>>        >>>>>> Strahil Nikolov
>>>>>        >>>>>>
>>>>>        >>>>>> На 15 юли 2020 г. 15:05:48 GMT+03:00,
>>>>>        >>>>>> "stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>"
>>>>>        >>>>>> <stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>> написа:
>>>>>        >>>>>>> Hello,
>>>>>        >>>>>>>
>>>>>        >>>>>>> Am 15.07.2020 um 13:42 Strahil Nikolov wrote:
>>>>>        >>>>>>>> By default libvirt is using NAT and not routed
>>>>> network
>>>> - in
>>>>>        >such
>>>>>        >>>>>>> case, vm1 won't receive data from host2.
>>>>>        >>>>>>>> Can you provide the Networks' xml ?
>>>>>        >>>>>>>>
>>>>>        >>>>>>>> Best Regards,
>>>>>        >>>>>>>> Strahil Nikolov
>>>>>        >>>>>>>>
>>>>>        >>>>>>> # cat default.xml
>>>>>        >>>>>>> <network>
>>>>>        >>>>>>>     <name>default</name>
>>>>>        >>>>>>>     <bridge name="virbr0"/>
>>>>>        >>>>>>>     <forward/>
>>>>>        >>>>>>>     <ip address="192.168.122.1"
>>>>> netmask="255.255.255.0">
>>>>>        >>>>>>>       <dhcp>
>>>>>        >>>>>>>         <range start="192.168.122.2"
>>>> end="192.168.122.254"/>
>>>>>        >>>>>>>       </dhcp>
>>>>>        >>>>>>>     </ip>
>>>>>        >>>>>>> </network>
>>>>>        >>>>>>>
>>>>>        >>>>>>> I just checked this and the file is identical on both
>>>> hosts.
>>>>>        >>>>>>>
>>>>>        >>>>>>> kind regards
>>>>>        >>>>>>> Stefan Schmitz
>>>>>        >>>>>>>
>>>>>        >>>>>>>
>>>>>        >>>>>>>> На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger
>>>>>        >>>>>>> <kwenning at redhat.com <mailto:kwenning at redhat.com>>
>>>>> написа:
>>>>>        >>>>>>>>> On 7/15/20 11:42 AM,
>>>>> stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com> wrote:
>>>>>        >>>>>>>>>> Hello,
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>> Am 15.07.2020 um 06:32 Strahil Nikolov wrote:
>>>>>        >>>>>>>>>>> How  did you configure the network on your ubuntu
>>>> 20.04
>>>>>        >Hosts ? I
>>>>>        >>>>>>>>>>> tried  to setup bridged connection for the test
>>>> setup , but
>>>>>        >>>>>>>>> obviously
>>>>>        >>>>>>>>>>> I'm missing something.
>>>>>        >>>>>>>>>>>
>>>>>        >>>>>>>>>>> Best Regards,
>>>>>        >>>>>>>>>>> Strahil Nikolov
>>>>>        >>>>>>>>>>>
>>>>>        >>>>>>>>>> on the hosts (CentOS) the bridge config looks like
>>>> that.The
>>>>>        >>>>>>> bridging
>>>>>        >>>>>>>>>> and configuration is handled by the virtualization
>>>> software:
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>> # cat ifcfg-br0
>>>>>        >>>>>>>>>> DEVICE=br0
>>>>>        >>>>>>>>>> TYPE=Bridge
>>>>>        >>>>>>>>>> BOOTPROTO=static
>>>>>        >>>>>>>>>> ONBOOT=yes
>>>>>        >>>>>>>>>> IPADDR=192.168.1.21
>>>>>        >>>>>>>>>> NETMASK=255.255.0.0
>>>>>        >>>>>>>>>> GATEWAY=192.168.1.1
>>>>>        >>>>>>>>>> NM_CONTROLLED=no
>>>>>        >>>>>>>>>> IPV6_AUTOCONF=yes
>>>>>        >>>>>>>>>> IPV6_DEFROUTE=yes
>>>>>        >>>>>>>>>> IPV6_PEERDNS=yes
>>>>>        >>>>>>>>>> IPV6_PEERROUTES=yes
>>>>>        >>>>>>>>>> IPV6_FAILURE_FATAL=no
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>> Am 15.07.2020 um 09:50 Klaus Wenninger wrote:
>>>>>        >>>>>>>>>>> Guess it is not easy to have your servers
>>>>> connected
>>>>>        >physically
>>>>>        >>>>>>>>>>> for
>>>>>        >>>>>>>>> a
>>>>>        >>>>>>>>>> try.
>>>>>        >>>>>>>>>>> But maybe you can at least try on one host to have
>>>>>        >virt_fenced &
>>>>>        >>>>>>> VM
>>>>>        >>>>>>>>>>> on the same bridge - just to see if that basic
>>>> pattern is
>>>>>        >>>>>>>>>>> working.
>>>>>        >>>>>>>>>> I am not sure if I understand you correctly.
>>>>> What do
>>>> you by
>>>>>        >having
>>>>>        >>>>>>>>>> them on the same bridge? The bridge device is
>>>> configured on
>>>>>        >the
>>>>>        >>>>>>> host
>>>>>        >>>>>>>>>> by the virtualization software.
>>>>>        >>>>>>>>> I meant to check out which bridge the interface of
>>>> the VM is
>>>>>        >>>>>>> enslaved
>>>>>        >>>>>>>>> to and to use that bridge as interface in
>>>>>        >/etc/fence_virt.conf.
>>>>>        >>>>>>>>> Get me right - just for now - just to see if it is
>>>>>       working for
>>>>>        >this
>>>>>        >>>>>>> one
>>>>>        >>>>>>>>> host and the corresponding guest.
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>>> Well maybe still sbdy in the middle playing IGMPv3
>>>> or the
>>>>>        >request
>>>>>        >>>>>>>>> for
>>>>>        >>>>>>>>>>> a certain source is needed to shoot open some
>>>> firewall or
>>>>>        >>>>>>>>> switch-tables.
>>>>>        >>>>>>>>>> I am still waiting for the final report from our
>>>>> Data
>>>>>       Center
>>>>>        >>>>>>>>>> techs.
>>>>>        >>>>>>> I
>>>>>        >>>>>>>>>> hope that will clear up somethings.
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>> Additionally  I have just noticed that
>>>>> apparently since
>>>>>        >switching
>>>>>        >>>>>>>>> from
>>>>>        >>>>>>>>>> IGMPv3 to IGMPv2 and back the command "fence_xvm -a
>>>>>        >225.0.0.12 -o
>>>>>        >>>>>>>>>> list" is no completely broken.
>>>>>        >>>>>>>>>> Before that switch this command at least returned
>>>> the local
>>>>>        >VM.
>>>>>        >>>>>>>>>> Now
>>>>>        >>>>>>>>> it
>>>>>        >>>>>>>>>> returns:
>>>>>        >>>>>>>>>> Timed out waiting for response
>>>>>        >>>>>>>>>> Operation failed
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>> I am a bit confused by that, because all we did was
>>>> running
>>>>>        >>>>>>> commands
>>>>>        >>>>>>>>>> like "sysctl -w
>>>>> net.ipv4.conf.all.force_igmp_version
>>>> =" with
>>>>>        >the
>>>>>        >>>>>>>>>> different Version umbers and #cat /proc/net/igmp
>>>> shows that
>>>>>        >V3 is
>>>>>        >>>>>>>>> used
>>>>>        >>>>>>>>>> again on every device just like before...?!
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>> kind regards
>>>>>        >>>>>>>>>> Stefan Schmitz
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>>
>>>>>        >>>>>>>>>>> На 14 юли 2020 г. 11:06:42 GMT+03:00,
>>>>>        >>>>>>>>>>> "stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>"
>>>>>        >>>>>>>>>>> <stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>> написа:
>>>>>        >>>>>>>>>>>> Hello,
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> Am 09.07.2020 um 19:10 Strahil Nikolov wrote:
>>>>>        >>>>>>>>>>>>> Have  you  run 'fence_virtd  -c' ?
>>>>>        >>>>>>>>>>>> Yes I had run that on both Hosts. The current
>>>> config looks
>>>>>        >like
>>>>>        >>>>>>>>> that
>>>>>        >>>>>>>>>>>> and
>>>>>        >>>>>>>>>>>> is identical on both.
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> cat fence_virt.conf
>>>>>        >>>>>>>>>>>> fence_virtd {
>>>>>        >>>>>>>>>>>>             listener = "multicast";
>>>>>        >>>>>>>>>>>>             backend = "libvirt";
>>>>>        >>>>>>>>>>>>             module_path =
>>>>> "/usr/lib64/fence-virt";
>>>>>        >>>>>>>>>>>> }
>>>>>     ��  >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> listeners {
>>>>>        >>>>>>>>>>>>             multicast {
>>>>>        >>>>>>>>>>>>                     key_file =
>>>>>        >"/etc/cluster/fence_xvm.key";
>>>>>        >>>>>>>>>>>>                     address = "225.0.0.12";
>>>>>        >>>>>>>>>>>>                     interface = "bond0";
>>>>>        >>>>>>>>>>>>                     family = "ipv4";
>>>>>        >>>>>>>>>>>>                     port = "1229";
>>>>>        >>>>>>>>>>>>             }
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> }
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> backends {
>>>>>        >>>>>>>>>>>>             libvirt {
>>>>>        >>>>>>>>>>>>                     uri = "qemu:///system";
>>>>>        >>>>>>>>>>>>             }
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> }
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> The situation is still that no matter on what
>>>>> host
>>>> I issue
>>>>>        >the
>>>>>        >>>>>>>>>>>> "fence_xvm -a 225.0.0.12 -o list" command,
>>>>> both guest
>>>>>        >systems
>>>>>        >>>>>>>>> receive
>>>>>        >>>>>>>>>>>> the traffic. The local guest, but also the guest
>>>> on the
>>>>>        >other
>>>>>        >>>>>>> host.
>>>>>        >>>>>>>>> I
>>>>>        >>>>>>>>>>>> reckon that means the traffic is not filtered
>>>>> by any
>>>>>        >network
>>>>>        >>>>>>>>> device,
>>>>>        >>>>>>>>>>>> like switches or firewalls. Since the guest on
>>>>> the
>>>> other
>>>>>        >host
>>>>>        >>>>>>>>> receives
>>>>>        >>>>>>>>>>>> the packages, the traffic must reach te physical
>>>>>       server and
>>>>>        >>>>>>>>>>>> networkdevice and is then routed to the VM on
>>>>> that
>>>> host.
>>>>>        >>>>>>>>>>>> But still, the traffic is not shown on the host
>>>> itself.
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> Further the local firewalls on both hosts are set
>>>> to let
>>>>>        >each
>>>>>        >>>>>>>>>>>> and
>>>>>        >>>>>>>>> every
>>>>>        >>>>>>>>>>>> traffic pass. Accept to any and everything. Well
>>>> at least
>>>>>        >as far
>>>>>        >>>>>>> as
>>>>>        >>>>>>>>> I
>>>>>        >>>>>>>>>>>> can see.
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> Am 09.07.2020 um 22:34 Klaus Wenninger wrote:
>>>>>        >>>>>>>>>>>>> makes me believe that
>>>>>        >>>>>>>>>>>>> the whole setup doesn't lookas I would have
>>>>>        >>>>>>>>>>>>> expected (bridges on each host where theguest
>>>>>        >>>>>>>>>>>>> has a connection to and where ethernet
>>>>> interfaces
>>>>>        >>>>>>>>>>>>> that connect the 2 hosts are part of as well
>>>>>        >>>>>>>>>>>> On each physical server the networkcards are
>>>> bonded to
>>>>>        >achieve
>>>>>        >>>>>>>>> failure
>>>>>        >>>>>>>>>>>> safety (bond0). The guest are connected over a
>>>> bridge(br0)
>>>>>        >but
>>>>>        >>>>>>>>>>>> apparently our virtualization softrware
>>>>> creates an
>>>> own
>>>>>        >device
>>>>>        >>>>>>> named
>>>>>        >>>>>>>>>>>> after the guest (kvm101.0).
>>>>>        >>>>>>>>>>>> There is no direct connection between the
>>>>> servers,
>>>> but
>>>>>       as I
>>>>>        >said
>>>>>        >>>>>>>>>>>> earlier, the multicast traffic does reach the VMs
>>>> so I
>>>>>        >assume
>>>>>        >>>>>>> there
>>>>>        >>>>>>>>> is
>>>>>        >>>>>>>>>>>> no problem with that.
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote:
>>>>>        >>>>>>>>>>>>> First, you need to ensure that your switch
>>>>> (or all
>>>>>        >switches in
>>>>>        >>>>>>> the
>>>>>        >>>>>>>>>>>>> path) have igmp snooping enabled on host
>>>>> ports (and
>>>>>        >probably
>>>>>        >>>>>>>>>>>>> interconnects along the path between your
>>>>> hosts).
>>>>>        >>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>> Second, you need an igmp querier to be enabled
>>>> somewhere
>>>>>        >near
>>>>>        >>>>>>>>> (better
>>>>>        >>>>>>>>>>>>> to have it enabled on a switch itself). Please
>>>> verify
>>>>>       that
>>>>>        >you
>>>>>        >>>>>>> see
>>>>>        >>>>>>>>>>>> its
>>>>>        >>>>>>>>>>>>> queries on hosts.
>>>>>        >>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>> Next, you probably need to make your hosts to
>>>>> use
>>>> IGMPv2
>>>>>        >>>>>>>>>>>>> (not 3)
>>>>>        >>>>>>>>> as
>>>>>        >>>>>>>>>>>>> many switches still can not understand v3. This
>>>> is doable
>>>>>        >by
>>>>>        >>>>>>>>> sysctl,
>>>>>        >>>>>>>>>>>>> find on internet, there are many articles.
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> I have send an query to our Data center Techs
>>>>> who are
>>>>>        >analyzing
>>>>>        >>>>>>>>> this
>>>>>        >>>>>>>>>>>> and
>>>>>        >>>>>>>>>>>> were already on it analyzing if multicast
>>>>> Traffic is
>>>>>        >somewhere
>>>>>        >>>>>>>>> blocked
>>>>>        >>>>>>>>>>>> or hindered. So far the answer is, "multicast ist
>>>>>       explictly
>>>>>        >>>>>>> allowed
>>>>>        >>>>>>>>> in
>>>>>        >>>>>>>>>>>> the local network and no packets are filtered or
>>>> dropped".
>>>>>        >I am
>>>>>        >>>>>>>>> still
>>>>>        >>>>>>>>>>>> waiting for a final report though.
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> In the meantime I have switched IGMPv3 to IGMPv2
>>>> on every
>>>>>        >>>>>>> involved
>>>>>        >>>>>>>>>>>> server, hosts and guests via the mentioned
>>>>> sysctl.
>>>> The
>>>>>        >switching
>>>>>        >>>>>>>>> itself
>>>>>        >>>>>>>>>>>> was successful, according to "cat
>>>>> /proc/net/igmp" but
>>>>>       sadly
>>>>>        >did
>>>>>        >>>>>>> not
>>>>>        >>>>>>>>>>>> better the behavior. It actually led to that
>>>>> no VM
>>>>>       received
>>>>>        >the
>>>>>        >>>>>>>>>>>> multicast traffic anymore too.
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> kind regards
>>>>>        >>>>>>>>>>>> Stefan Schmitz
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>
>>>>>        >>>>>>>>>>>> Am 09.07.2020 um 22:34 schrieb Klaus Wenninger:
>>>>>        >>>>>>>>>>>>> On 7/9/20 5:17 PM,
>>>> stefan.schmitz at farmpartner-tec.com
>>>>>       <mailto:stefan.schmitz at farmpartner-tec.com>
>>>>>        >wrote:
>>>>>        >>>>>>>>>>>>>> Hello,
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>>> Well, theory still holds I would say.
>>>>>        >>>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>>> I guess that the multicast-traffic from the
>>>> other host
>>>>>        >>>>>>>>>>>>>>> or the guestsdoesn't get to the daemon on the
>>>> host.
>>>>>        >>>>>>>>>>>>>>> Can't you just simply check if there are any
>>>> firewall
>>>>>        >>>>>>>>>>>>>>> rules configuredon the host kernel?
>>>>>        >>>>>>>>>>>>>> I hope I did understand you corretcly and
>>>>> you are
>>>>>        >referring to
>>>>>        >>>>>>>>>>>> iptables?
>>>>>        >>>>>>>>>>>>> I didn't say iptables because it might have been
>>>>>        >>>>>>>>>>>>> nftables - but yesthat is what I was
>>>>> referring to.
>>>>>        >>>>>>>>>>>>> Guess to understand the config the output is
>>>>>        >>>>>>>>>>>>> lacking verbositybut it makes me believe that
>>>>>        >>>>>>>>>>>>> the whole setup doesn't lookas I would have
>>>>>        >>>>>>>>>>>>> expected (bridges on each host where theguest
>>>>>        >>>>>>>>>>>>> has a connection to and where ethernet
>>>>> interfaces
>>>>>        >>>>>>>>>>>>> that connect the 2 hosts are part of as well -
>>>>>        >>>>>>>>>>>>> everythingconnected via layer 2 basically).
>>>>>        >>>>>>>>>>>>>> Here is the output of the current rules.
>>>>> Besides
>>>> the IP
>>>>>        >of the
>>>>>        >>>>>>>>> guest
>>>>>        >>>>>>>>>>>>>> the output is identical on both hosts:
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>> # iptables -S
>>>>>        >>>>>>>>>>>>>> -P INPUT ACCEPT
>>>>>        >>>>>>>>>>>>>> -P FORWARD ACCEPT
>>>>>        >>>>>>>>>>>>>> -P OUTPUT ACCEPT
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>> # iptables -L
>>>>>        >>>>>>>>>>>>>> Chain INPUT (policy ACCEPT)
>>>>>        >>>>>>>>>>>>>> target     prot opt source
>>>> destination
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>> Chain FORWARD (policy ACCEPT)
>>>>>        >>>>>>>>>>>>>> target     prot opt source
>>>> destination
>>>>>        >>>>>>>>>>>>>> SOLUSVM_TRAFFIC_IN  all  --  anywhere
>>>>>        >anywhere
>>>>>        >>>>>>>>>>>>>> SOLUSVM_TRAFFIC_OUT  all  --  anywhere
>>>>>        >anywhere
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>> Chain OUTPUT (policy ACCEPT)
>>>>>        >>>>>>>>>>>>>> target     prot opt source
>>>> destination
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>> Chain SOLUSVM_TRAFFIC_IN (1 references)
>>>>>        >>>>>>>>>>>>>> target     prot opt source
>>>> destination
>>>>>        >>>>>>>>>>>>>>                 all  --  anywhere
>>>>>        >192.168.1.14
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>> Chain SOLUSVM_TRAFFIC_OUT (1 references)
>>>>>        >>>>>>>>>>>>>> target     prot opt source
>>>> destination
>>>>>        >>>>>>>>>>>>>>                 all  --  192.168.1.14 anywhere
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>> kind regards
>>>>>        >>>>>>>>>>>>>> Stefan Schmitz
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>>>>>>>>>>
>>>>>        >>>>>> _______________________________________________
>>>>>        >>>>>> Manage your subscription:
>>>>>        >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>        >>>>>>
>>>>>        >>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>        >>>>>
>>>>>        >>>>
>>>>>        >>>
>>>>>        >> _______________________________________________
>>>>>        >> Manage your subscription:
>>>>>        >> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>        >>
>>>>>        >> ClusterLabs home: https://www.clusterlabs.org/
>>>>>        >_______________________________________________
>>>>>        >Manage your subscription:
>>>>>        >https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>        >
>>>>>        >ClusterLabs home: https://www.clusterlabs.org/
>>>>>       _______________________________________________
>>>>>       Manage your subscription:
>>>>>       https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>>       ClusterLabs home: https://www.clusterlabs.org/
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Regards,
>>>>>
>>>>> Reid Wahl, RHCA
>>>>> Software Maintenance Engineer, Red Hat
>>>>> CEE - Platform Support Delivery - ClusterHA
>>>>
>>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>