[ClusterLabs] Still Beginner STONITH Problem
Klaus Wenninger
kwenning at redhat.com
Tue Jul 21 02:32:57 EDT 2020
On 7/20/20 5:05 PM, Stefan Schmitz wrote:
> Hello,
>
> I have now deleted the previous stonith resource and added two new
> ones, one for each server. The commands I used for that:
>
> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt
> hostlist="host2" pcmk_host_list="server2ubuntu1,server4ubuntu1"
> hypervisor_uri="qemu+ssh://192.168.1.13/system"
>
> # pcs -f stonith_cfg stonith create stonith_id_2 external/libvirt
> hostlist="Host4" pcmk_host_list="server2ubuntu1,server4ubuntu1"
> hypervisor_uri="qemu+ssh://192.168.1.21/system"
As already mentioned external/libvirt is the wrong fence-agent.
You have to use fence_xvm as you've tried on the cmdline.
You don't need hostlist and probably no pcmk_host_list as well
as fence_xvm is gonna query fence_virtd for possible targets.
If you don't have a 1:1 match between host names in libvirt
and node-names you will definitely need a
pcmk_host_map="{pacemaker-node1}:{guest-name1};..."
And you have to give the attribute multicast_address=... .
The reference to libvirt/hypervisor is solely in fence_virtd.
If you intended to switch over to a solution without
fence_virtd-service and a fence-agent that is directly talking
to the hypervisor forget my comments - haven't done that with
libvirt so far.
Klaus
>
>
> The behaviour is now somewhat different but it still does work. I
> guess I am doing something completely wrong in in setting up the
> stonith resource?
>
> The pcs status command shows two running stonith resources on one
> server but two stopped ones on the other. Additionally there is an
> failed fencing action. The one server showing running stonith is
> marked as unclean and fencing wants to reboot it but fails doing so.
>
> Any advise on how to proceed would be greatly appreaciated.
>
>
> The pcs shortened status outputs of each of the VMs:
>
>
> # pcs status of server2ubuntu1
> [...]
> Node List:
> * Node server2ubuntu1: UNCLEAN (online)
> * Online: [ server4ubuntu1 ]
>
> Full List of Resources:
> [...]
> * stonith_id_1 (stonith:external/libvirt): Started
> server2ubuntu1
> * stonith_id_2 (stonith:external/libvirt): Started
> server2ubuntu1
>
> Failed Resource Actions:
> * stonith_id_1_start_0 on server4ubuntu1 'error' (1): call=228,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:35:45
> +01:00', queued=4391ms, exec=2890ms
> * stonith_id_2_start_0 on server4ubuntu1 'error' (1): call=229,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:35:45
> +01:00', queued=2230ms, exec=-441815ms
> * r0_pacemaker_stop_0 on server2ubuntu1 'error' (1): call=198,
> status='Timed Out', exitreason='', last-rc-change='1970-01-08 01:33:54
> +01:00', queued=321ms, exec=115529ms
> * stonith_id_1_start_0 on server2ubuntu1 'error' (1): call=196,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:33:54
> +01:00', queued=161ms, exec=-443582ms
> * stonith_id_2_start_0 on server2ubuntu1 'error' (1): call=197,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:33:54
> +01:00', queued=69ms, exec=-444042ms
>
> Failed Fencing Actions:
> * reboot of server2ubuntu1 failed: delegate=,
> client=pacemaker-controld.2002, origin=server4ubuntu1,
> last-failed='2020-07-20 16:51:49 +02:00'
>
>
>
> # pcs status of server4ubuntu1
> [...]
> Node List:
> * Node server2ubuntu1: UNCLEAN (online)
> * Online: [ server4ubuntu1 ]
>
> Full List of Resources:
> [...]
> * stonith_id_1 (stonith:external/libvirt): FAILED
> server4ubuntu1
> * stonith_id_2 (stonith:external/libvirt): FAILED
> server4ubuntu1
>
> Failed Resource Actions:
> * stonith_id_1_start_0 on server4ubuntu1 'error' (1): call=248,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:45:07
> +01:00', queued=350ms, exec=516901ms
> * stonith_id_2_start_0 on server4ubuntu1 'error' (1): call=249,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:45:07
> +01:00', queued=149ms, exec=515438ms
> * stonith_id_1_start_0 on server2ubuntu1 'error' (1): call=215,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:44:53
> +01:00', queued=189ms, exec=534334ms
> * stonith_id_2_start_0 on server2ubuntu1 'error' (1): call=216,
> status='complete', exitreason='', last-rc-change='1970-01-08 01:44:53
> +01:00', queued=82ms, exec=564228ms
>
> Failed Fencing Actions:
> * reboot of server2ubuntu1 failed: delegate=,
> client=pacemaker-controld.2002, origin=server4ubuntu1,
> last-failed='2020-07-20 16:51:49 +02:00'
>
>
>
> kind regards
> Stefan Schmitz
>
>
>
>
> Am 20.07.2020 um 13:51 schrieb Stefan Schmitz:
>>
>>
>>
>> Am 20.07.2020 um 13:36 schrieb Klaus Wenninger:
>>> On 7/20/20 1:10 PM, Stefan Schmitz wrote:
>>>> Hello,
>>>>
>>>> thank you all very much for your help so far!
>>>>
>>>> We have no managed to capture the mulitcast traffic originating from
>>>> one host when issuing the command "fence_xvm -o list" on the other
>>>> host. Now the tcpdump at least looks exactly the same on all 4
>>>> servers, hosts and guest. I can not tell how and why this just started
>>>> working, but I got our Datacenter Techs final report this morning,
>>>> that there are no problems present.
>>>>
>>>>
>>>>
>>>> Am 19.07.2020 um 09:32 schrieb Andrei Borzenkov:
>>>>> external/libvirt is unrelated to fence_xvm
>>>>
>>>> Could you please explain that a bit more? Do you mean that the current
>>>> problem of the dysfunctional Stonith/fencing is unrelated to libvirt?
>>> Hadn't spotted that ... sry
>>> What he meant is if you are using fence_virtd-service on
>>> the host(s) then the matching fencing-resource is based
>>> on fence_xvm and not external/libvirt.
>>> The libvirt-stuff is handled by the daemon running on your host.
>>>>
>>>>> fence_xvm opens TCP listening socket, sends request and waits for
>>>>> connection to this socket (from fence_virtd) which is used to submit
>>>>> actual fencing operation. Only the first connection request is
>>>>> handled.
>>>>> So first host that responds will be processed. Local host is likely
>>>>> always faster to respond than remote host.
>>>>
>>>> Thank you for the explanation, I get that. But what would you suggest
>>>> to remedy this situation? We have been using libvirt and fence_xvm
>>>> because of the clusterlabs wiki articles and the suggestions in this
>>>> mailing list. Is there anything you suggest we need to change to make
>>>> this Cluster finally work?
>>> Guess what he meant, what I've already suggested before
>>> and what is as well described in the
>>> article linked is having totally separate configurations for
>>> each host. If you are using different multicast-addresses
>>> or unicast - as Andrei is suggesting and which I haven't used
>>> before - probably doesn't matter. (Unless of course something
>>> is really blocking multicast ...)
>>> And you have to setup one fencing-resource per host
>>> (fence_xvm) that has the address configured you've setup
>>> on each of the hosts.
>>
>> Thank you for thte explanation. I sadly cannot access the articles. I
>> take, totally separate configurations means having a stonith resource
>> configured in the cluster for each host. So for now I will delete the
>> current resource and try to configure two new ones.
>>
>>>>
>>>>
>>>> Am 18.07.2020 um 02:36 schrieb Reid Wahl:
>>>>> However, when users want to configure fence_xvm for multiple hosts
>>>> with the libvirt backend, I have typically seen them configure
>>>> multiple fence_xvm devices (one per host) and configure a different
>>>> multicast address on each host.
>>>>
>>>> I do have an Red Hat Account but not a payed subscription, which sadly
>>>> is needed to access the articles you have linked.
>>>>
>>>> We have installed fence_virt on both hosts since the beginning, if
>>>> that is what you mean by " multiple fence_xvm devices (one per host)".
>>>> They were however both configured to use the same multicast IP Adress,
>>>> which we now changed so that each hosts fence_xvm install uses a
>>>> different multicast IP. Sadly this does not seem to change anything in
>>>> the behaviour.
>>>> What is interesting though is, that i ran again fence_xvm -c changed
>>>> the multicast IP to 225.0.0.13 (from .12). I killed and restarted the
>>>> daemon multiple times after that.
>>>> When I now run #fence_xvm -o list without specifiying an IP adress
>>>> tcpdump on the other host still shows the old IP as the originating
>>>> one.
>>>> tcpdum on other host:
>>>> Host4.54001 > 225.0.0.12.zented: [udp sum ok] UDP, length 176
>>>> Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.12 to_in { }]
>>>> Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.12 to_in { }]
>>>> Only when I specify the other IP it apparently really gets used:
>>>> # fence_xvm -a 225.0.0.13 -o list
>>>> tcpdum on other host:
>>>> Host4.46011 > 225.0.0.13.zented: [udp sum ok] UDP, length 176
>>>> Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.13 to_in { }]
>>>> Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr
>>>> 225.0.0.13 to_in { }]
>>>>
>>>>
>>>>
>>>>
>>>> Am 17.07.2020 um 16:49 schrieb Strahil Nikolov:
>>>>> The simplest way to check if the libvirt's network is NAT (or not)
>>>> is to try to ssh from the first VM to the second one.
>>>> That does work without any issue. I can ssh to any server in our
>>>> network, host or guest, without a problem. Does that mean there is no
>>>> natting involved?
>>>>
>>>>
>>>>
>>>> Am 17.07.2020 um 16:41 schrieb Klaus Wenninger:
>>>>> How does your VM part of the network-config look like?
>>>> # cat ifcfg-br0
>>>> DEVICE=br0
>>>> TYPE=Bridge
>>>> BOOTPROTO=static
>>>> ONBOOT=yes
>>>> IPADDR=192.168.1.13
>>>> NETMASK=255.255.0.0
>>>> GATEWAY=192.168.1.1
>>>> NM_CONTROLLED=no
>>>> IPV6_AUTOCONF=yes
>>>> IPV6_DEFROUTE=yes
>>>> IPV6_PEERDNS=yes
>>>> IPV6_PEERROUTES=yes
>>>> IPV6_FAILURE_FATAL=no
>>>>
>>>>
>>>>>> I am at a loss an do not know why this is NAT. I am aware what NAT
>>>>>> means, but what am I supposed to reconfigure here to dolve the
>>>> problem?
>>>>> As long as you stay within the subnet you are running on your bridge
>>>>> you won't get natted but once it starts to route via the host the
>>>> libvirt
>>>>> default bridge will be natted.
>>>>> What you can do is connect the bridges on your 2 hosts via layer 2.
>>>>> Possible ways should be OpenVPN, knet, VLAN on your switches ...
>>>>> (and yes - a cable )
>>>>> If your guests are using DHCP you should probably configure
>>>>> fixed IPs for those MACs.
>>>> All our server have fixed IPs, DHCP is not used anywhere in our
>>>> network for dynamic IPs assignment.
>>>> Regarding the "check if VMs are natted", is this solved by the ssh
>>>> test suggested by Strahil Nikolov? Can I assume natting is not a
>>>> problem here or do we still have to take measures?
>>>>
>>>>
>>>>
>>>> kind regards
>>>> Stefan Schmitz
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Am 18.07.2020 um 02:36 schrieb Reid Wahl:
>>>>> I'm not sure that the libvirt backend is intended to be used in this
>>>>> way, with multiple hosts using the same multicast address. From the
>>>>> fence_virt.conf man page:
>>>>>
>>>>> ~~~
>>>>> BACKENDS
>>>>> libvirt
>>>>> The libvirt plugin is the simplest plugin. It is
>>>>> used in
>>>>> environments where routing fencing requests between multiple hosts is
>>>>> not required, for example by a user running a cluster of virtual
>>>>> machines on a single desktop computer.
>>>>> libvirt-qmf
>>>>> The libvirt-qmf plugin acts as a QMFv2 Console to the
>>>>> libvirt-qmf daemon in order to route fencing requests over AMQP to
>>>>> the
>>>>> appropriate computer.
>>>>> cpg
>>>>> The cpg plugin uses corosync CPG and libvirt to track
>>>>> virtual
>>>>> machines and route fencing requests to the appropriate computer.
>>>>> ~~~
>>>>>
>>>>> I'm not an expert on fence_xvm or libvirt. It's possible that this
>>>>> is a
>>>>> viable configuration with the libvirt backend.
>>>>>
>>>>> However, when users want to configure fence_xvm for multiple hosts
>>>>> with
>>>>> the libvirt backend, I have typically seen them configure multiple
>>>>> fence_xvm devices (one per host) and configure a different multicast
>>>>> address on each host.
>>>>>
>>>>> If you have a Red Hat account, see also:
>>>>> - https://access.redhat.com/solutions/2386421#comment-1209661
>>>>> - https://access.redhat.com/solutions/2386421#comment-1209801
>>>>>
>>>>> On Fri, Jul 17, 2020 at 7:49 AM Strahil Nikolov
>>>>> <hunter86_bg at yahoo.com
>>>>> <mailto:hunter86_bg at yahoo.com>> wrote:
>>>>>
>>>>> The simplest way to check if the libvirt's network is NAT
>>>>> (or not)
>>>>> is to try to ssh from the first VM to the second one.
>>>>>
>>>>> I should admit that I was lost when I tried to create a
>>>>> routed
>>>>> network in KVM, so I can't help with that.
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>
>>>>> На 17 юли 2020 г. 16:56:44 GMT+03:00,
>>>>> "stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>"
>>>>> <stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>> написа:
>>>>> >Hello,
>>>>> >
>>>>> >I have now managed to get # fence_xvm -a 225.0.0.12 -o
>>>>> list to
>>>>> list at
>>>>> >least its local Guest again. It seems the fence_virtd was not
>>>> working
>>>>> >properly anymore.
>>>>> >
>>>>> >Regarding the Network XML config
>>>>> >
>>>>> ># cat default.xml
>>>>> > <network>
>>>>> > <name>default</name>
>>>>> > <bridge name="virbr0"/>
>>>>> > <forward/>
>>>>> > <ip address="192.168.122.1" netmask="255.255.255.0">
>>>>> > <dhcp>
>>>>> > <range start="192.168.122.2"
>>>>> end="192.168.122.254"/>
>>>>> > </dhcp>
>>>>> > </ip>
>>>>> > </network>
>>>>> >
>>>>> >I have used "virsh net-edit default" to test other network
>>>> Devices on
>>>>> >the hosts but this did not change anything.
>>>>> >
>>>>> >Regarding the statement
>>>>> >
>>>>> > > If it is created by libvirt - this is NAT and you will
>>>>> never
>>>>> > > receive output from the other host.
>>>>> >
>>>>> >I am at a loss an do not know why this is NAT. I am aware
>>>>> what
>>>> NAT
>>>>> >means, but what am I supposed to reconfigure here to dolve
>>>>> the
>>>>> problem?
>>>>> >Any help would be greatly appreciated.
>>>>> >Thank you in advance.
>>>>> >
>>>>> >Kind regards
>>>>> >Stefan Schmitz
>>>>> >
>>>>> >
>>>>> >Am 15.07.2020 um 16:48 schrieb
>>>>> stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>:
>>>>> >>
>>>>> >> Am 15.07.2020 um 16:29 schrieb Klaus Wenninger:
>>>>> >>> On 7/15/20 4:21 PM, stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com> wrote:
>>>>> >>>> Hello,
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Am 15.07.2020 um 15:30 schrieb Klaus Wenninger:
>>>>> >>>>> On 7/15/20 3:15 PM, Strahil Nikolov wrote:
>>>>> >>>>>> If it is created by libvirt - this is NAT and you will
>>>> never
>>>>> >>>>>> receive output from the other host.
>>>>> >>>>> And twice the same subnet behind NAT is probably giving
>>>>> >>>>> issues at other places as well.
>>>>> >>>>> And if using DHCP you have to at least enforce that both
>>>> sides
>>>>> >>>>> don't go for the same IP at least.
>>>>> >>>>> But all no explanation why it doesn't work on the
>>>>> same host.
>>>>> >>>>> Which is why I was asking for running the service on the
>>>>> >>>>> bridge to check if that would work at least. So that we
>>>>> >>>>> can go forward step by step.
>>>>> >>>>
>>>>> >>>> I just now finished trying and testing it on both hosts.
>>>>> >>>> I ran # fence_virtd -c on both hosts and entered
>>>>> different
>>>> network
>>>>> >>>> devices. On both I tried br0 and the kvm10x.0.
>>>>> >>> According to your libvirt-config I would have expected
>>>>> >>> the bridge to be virbr0.
>>>>> >>
>>>>> >> I understand that, but an "virbr0" Device does not seem to
>>>> exist on
>>>>> >any
>>>>> >> of the two hosts.
>>>>> >>
>>>>> >> # ip link show
>>>>> >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state
>>>> UNKNOWN
>>>>> >mode
>>>>> >> DEFAULT group default qlen 1000
>>>>> >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>>> >> 2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500
>>>> qdisc mq
>>>>> >> master bond0 state UP mode DEFAULT group default qlen 1000
>>>>> >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>> >> 3: enp216s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
>>>> state DOWN
>>>>> >mode
>>>>> >> DEFAULT group default qlen 1000
>>>>> >> link/ether ac:1f:6b:26:69:dc brd ff:ff:ff:ff:ff:ff
>>>>> >> 4: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500
>>>> qdisc mq
>>>>> >> master bond0 state UP mode DEFAULT group default qlen 1000
>>>>> >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>> >> 5: enp216s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
>>>> state DOWN
>>>>> >mode
>>>>> >> DEFAULT group default qlen 1000
>>>>> >> link/ether ac:1f:6b:26:69:dd brd ff:ff:ff:ff:ff:ff
>>>>> >> 6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500
>>>> qdisc
>>>>> >> noqueue master br0 state UP mode DEFAULT group default qlen
>>>> 1000
>>>>> >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>> >> 7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>>>> noqueue
>>>>> >state
>>>>> >> UP mode DEFAULT group default qlen 1000
>>>>> >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff
>>>>> >> 8: kvm101.0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
>>>>> qdisc
>>>>> >pfifo_fast
>>>>> >> master br0 state UNKNOWN mode DEFAULT group default qlen
>>>>> 1000
>>>>> >> link/ether fe:16:3c:ba:10:6c brd ff:ff:ff:ff:ff:ff
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>>>
>>>>> >>>> After each reconfiguration I ran #fence_xvm -a 225.0.0.12
>>>> -o list
>>>>> >>>> On the second server it worked with each device. After
>>>>> that I
>>>>> >>>> reconfigured back to the normal device, bond0, on
>>>>> which it
>>>> did not
>>>>> >>>> work anymore, it worked now again!
>>>>> >>>> # fence_xvm -a 225.0.0.12 -o list
>>>>> >>>> kvm102
>>>>> >bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on
>>>>> >>>>
>>>>> >>>> But anyhow not on the first server, it did not work
>>>>> with any
>>>>> >device.
>>>>> >>>> # fence_xvm -a 225.0.0.12 -o list always resulted in
>>>>> >>>> Timed out waiting for response
>>>>> >>>> Operation failed
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Am 15.07.2020 um 15:15 schrieb Strahil Nikolov:
>>>>> >>>>> If it is created by libvirt - this is NAT and you
>>>>> will never
>>>>> >receive
>>>>> >>>> output from the other host.
>>>>> >>>>>
>>>>> >>>> To my knowledge this is configured by libvirt. At least I
>>>> am not
>>>>> >aware
>>>>> >>>> having changend or configured it in any way. Up until
>>>> today I did
>>>>> >not
>>>>> >>>> even know that file existed. Could you please advise on
>>>> what I
>>>>> need
>>>>> >to
>>>>> >>>> do to fix this issue?
>>>>> >>>>
>>>>> >>>> Kind regards
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>> Is pacemaker/corosync/knet btw. using the same
>>>> interfaces/IPs?
>>>>> >>>>>
>>>>> >>>>> Klaus
>>>>> >>>>>>
>>>>> >>>>>> Best Regards,
>>>>> >>>>>> Strahil Nikolov
>>>>> >>>>>>
>>>>> >>>>>> На 15 юли 2020 г. 15:05:48 GMT+03:00,
>>>>> >>>>>> "stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>"
>>>>> >>>>>> <stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>> написа:
>>>>> >>>>>>> Hello,
>>>>> >>>>>>>
>>>>> >>>>>>> Am 15.07.2020 um 13:42 Strahil Nikolov wrote:
>>>>> >>>>>>>> By default libvirt is using NAT and not routed
>>>>> network
>>>> - in
>>>>> >such
>>>>> >>>>>>> case, vm1 won't receive data from host2.
>>>>> >>>>>>>> Can you provide the Networks' xml ?
>>>>> >>>>>>>>
>>>>> >>>>>>>> Best Regards,
>>>>> >>>>>>>> Strahil Nikolov
>>>>> >>>>>>>>
>>>>> >>>>>>> # cat default.xml
>>>>> >>>>>>> <network>
>>>>> >>>>>>> <name>default</name>
>>>>> >>>>>>> <bridge name="virbr0"/>
>>>>> >>>>>>> <forward/>
>>>>> >>>>>>> <ip address="192.168.122.1"
>>>>> netmask="255.255.255.0">
>>>>> >>>>>>> <dhcp>
>>>>> >>>>>>> <range start="192.168.122.2"
>>>> end="192.168.122.254"/>
>>>>> >>>>>>> </dhcp>
>>>>> >>>>>>> </ip>
>>>>> >>>>>>> </network>
>>>>> >>>>>>>
>>>>> >>>>>>> I just checked this and the file is identical on both
>>>> hosts.
>>>>> >>>>>>>
>>>>> >>>>>>> kind regards
>>>>> >>>>>>> Stefan Schmitz
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>> На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger
>>>>> >>>>>>> <kwenning at redhat.com <mailto:kwenning at redhat.com>>
>>>>> написа:
>>>>> >>>>>>>>> On 7/15/20 11:42 AM,
>>>>> stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com> wrote:
>>>>> >>>>>>>>>> Hello,
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Am 15.07.2020 um 06:32 Strahil Nikolov wrote:
>>>>> >>>>>>>>>>> How did you configure the network on your ubuntu
>>>> 20.04
>>>>> >Hosts ? I
>>>>> >>>>>>>>>>> tried to setup bridged connection for the test
>>>> setup , but
>>>>> >>>>>>>>> obviously
>>>>> >>>>>>>>>>> I'm missing something.
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>>> Best Regards,
>>>>> >>>>>>>>>>> Strahil Nikolov
>>>>> >>>>>>>>>>>
>>>>> >>>>>>>>>> on the hosts (CentOS) the bridge config looks like
>>>> that.The
>>>>> >>>>>>> bridging
>>>>> >>>>>>>>>> and configuration is handled by the virtualization
>>>> software:
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> # cat ifcfg-br0
>>>>> >>>>>>>>>> DEVICE=br0
>>>>> >>>>>>>>>> TYPE=Bridge
>>>>> >>>>>>>>>> BOOTPROTO=static
>>>>> >>>>>>>>>> ONBOOT=yes
>>>>> >>>>>>>>>> IPADDR=192.168.1.21
>>>>> >>>>>>>>>> NETMASK=255.255.0.0
>>>>> >>>>>>>>>> GATEWAY=192.168.1.1
>>>>> >>>>>>>>>> NM_CONTROLLED=no
>>>>> >>>>>>>>>> IPV6_AUTOCONF=yes
>>>>> >>>>>>>>>> IPV6_DEFROUTE=yes
>>>>> >>>>>>>>>> IPV6_PEERDNS=yes
>>>>> >>>>>>>>>> IPV6_PEERROUTES=yes
>>>>> >>>>>>>>>> IPV6_FAILURE_FATAL=no
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Am 15.07.2020 um 09:50 Klaus Wenninger wrote:
>>>>> >>>>>>>>>>> Guess it is not easy to have your servers
>>>>> connected
>>>>> >physically
>>>>> >>>>>>>>>>> for
>>>>> >>>>>>>>> a
>>>>> >>>>>>>>>> try.
>>>>> >>>>>>>>>>> But maybe you can at least try on one host to have
>>>>> >virt_fenced &
>>>>> >>>>>>> VM
>>>>> >>>>>>>>>>> on the same bridge - just to see if that basic
>>>> pattern is
>>>>> >>>>>>>>>>> working.
>>>>> >>>>>>>>>> I am not sure if I understand you correctly.
>>>>> What do
>>>> you by
>>>>> >having
>>>>> >>>>>>>>>> them on the same bridge? The bridge device is
>>>> configured on
>>>>> >the
>>>>> >>>>>>> host
>>>>> >>>>>>>>>> by the virtualization software.
>>>>> >>>>>>>>> I meant to check out which bridge the interface of
>>>> the VM is
>>>>> >>>>>>> enslaved
>>>>> >>>>>>>>> to and to use that bridge as interface in
>>>>> >/etc/fence_virt.conf.
>>>>> >>>>>>>>> Get me right - just for now - just to see if it is
>>>>> working for
>>>>> >this
>>>>> >>>>>>> one
>>>>> >>>>>>>>> host and the corresponding guest.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>> Well maybe still sbdy in the middle playing IGMPv3
>>>> or the
>>>>> >request
>>>>> >>>>>>>>> for
>>>>> >>>>>>>>>>> a certain source is needed to shoot open some
>>>> firewall or
>>>>> >>>>>>>>> switch-tables.
>>>>> >>>>>>>>>> I am still waiting for the final report from our
>>>>> Data
>>>>> Center
>>>>> >>>>>>>>>> techs.
>>>>> >>>>>>> I
>>>>> >>>>>>>>>> hope that will clear up somethings.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Additionally I have just noticed that
>>>>> apparently since
>>>>> >switching
>>>>> >>>>>>>>> from
>>>>> >>>>>>>>>> IGMPv3 to IGMPv2 and back the command "fence_xvm -a
>>>>> >225.0.0.12 -o
>>>>> >>>>>>>>>> list" is no completely broken.
>>>>> >>>>>>>>>> Before that switch this command at least returned
>>>> the local
>>>>> >VM.
>>>>> >>>>>>>>>> Now
>>>>> >>>>>>>>> it
>>>>> >>>>>>>>>> returns:
>>>>> >>>>>>>>>> Timed out waiting for response
>>>>> >>>>>>>>>> Operation failed
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> I am a bit confused by that, because all we did was
>>>> running
>>>>> >>>>>>> commands
>>>>> >>>>>>>>>> like "sysctl -w
>>>>> net.ipv4.conf.all.force_igmp_version
>>>> =" with
>>>>> >the
>>>>> >>>>>>>>>> different Version umbers and #cat /proc/net/igmp
>>>> shows that
>>>>> >V3 is
>>>>> >>>>>>>>> used
>>>>> >>>>>>>>>> again on every device just like before...?!
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> kind regards
>>>>> >>>>>>>>>> Stefan Schmitz
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>> На 14 юли 2020 г. 11:06:42 GMT+03:00,
>>>>> >>>>>>>>>>> "stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>"
>>>>> >>>>>>>>>>> <stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>> написа:
>>>>> >>>>>>>>>>>> Hello,
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> Am 09.07.2020 um 19:10 Strahil Nikolov wrote:
>>>>> >>>>>>>>>>>>> Have you run 'fence_virtd -c' ?
>>>>> >>>>>>>>>>>> Yes I had run that on both Hosts. The current
>>>> config looks
>>>>> >like
>>>>> >>>>>>>>> that
>>>>> >>>>>>>>>>>> and
>>>>> >>>>>>>>>>>> is identical on both.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> cat fence_virt.conf
>>>>> >>>>>>>>>>>> fence_virtd {
>>>>> >>>>>>>>>>>> listener = "multicast";
>>>>> >>>>>>>>>>>> backend = "libvirt";
>>>>> >>>>>>>>>>>> module_path =
>>>>> "/usr/lib64/fence-virt";
>>>>> >>>>>>>>>>>> }
>>>>> �� >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> listeners {
>>>>> >>>>>>>>>>>> multicast {
>>>>> >>>>>>>>>>>> key_file =
>>>>> >"/etc/cluster/fence_xvm.key";
>>>>> >>>>>>>>>>>> address = "225.0.0.12";
>>>>> >>>>>>>>>>>> interface = "bond0";
>>>>> >>>>>>>>>>>> family = "ipv4";
>>>>> >>>>>>>>>>>> port = "1229";
>>>>> >>>>>>>>>>>> }
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> }
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> backends {
>>>>> >>>>>>>>>>>> libvirt {
>>>>> >>>>>>>>>>>> uri = "qemu:///system";
>>>>> >>>>>>>>>>>> }
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> }
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> The situation is still that no matter on what
>>>>> host
>>>> I issue
>>>>> >the
>>>>> >>>>>>>>>>>> "fence_xvm -a 225.0.0.12 -o list" command,
>>>>> both guest
>>>>> >systems
>>>>> >>>>>>>>> receive
>>>>> >>>>>>>>>>>> the traffic. The local guest, but also the guest
>>>> on the
>>>>> >other
>>>>> >>>>>>> host.
>>>>> >>>>>>>>> I
>>>>> >>>>>>>>>>>> reckon that means the traffic is not filtered
>>>>> by any
>>>>> >network
>>>>> >>>>>>>>> device,
>>>>> >>>>>>>>>>>> like switches or firewalls. Since the guest on
>>>>> the
>>>> other
>>>>> >host
>>>>> >>>>>>>>> receives
>>>>> >>>>>>>>>>>> the packages, the traffic must reach te physical
>>>>> server and
>>>>> >>>>>>>>>>>> networkdevice and is then routed to the VM on
>>>>> that
>>>> host.
>>>>> >>>>>>>>>>>> But still, the traffic is not shown on the host
>>>> itself.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> Further the local firewalls on both hosts are set
>>>> to let
>>>>> >each
>>>>> >>>>>>>>>>>> and
>>>>> >>>>>>>>> every
>>>>> >>>>>>>>>>>> traffic pass. Accept to any and everything. Well
>>>> at least
>>>>> >as far
>>>>> >>>>>>> as
>>>>> >>>>>>>>> I
>>>>> >>>>>>>>>>>> can see.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> Am 09.07.2020 um 22:34 Klaus Wenninger wrote:
>>>>> >>>>>>>>>>>>> makes me believe that
>>>>> >>>>>>>>>>>>> the whole setup doesn't lookas I would have
>>>>> >>>>>>>>>>>>> expected (bridges on each host where theguest
>>>>> >>>>>>>>>>>>> has a connection to and where ethernet
>>>>> interfaces
>>>>> >>>>>>>>>>>>> that connect the 2 hosts are part of as well
>>>>> >>>>>>>>>>>> On each physical server the networkcards are
>>>> bonded to
>>>>> >achieve
>>>>> >>>>>>>>> failure
>>>>> >>>>>>>>>>>> safety (bond0). The guest are connected over a
>>>> bridge(br0)
>>>>> >but
>>>>> >>>>>>>>>>>> apparently our virtualization softrware
>>>>> creates an
>>>> own
>>>>> >device
>>>>> >>>>>>> named
>>>>> >>>>>>>>>>>> after the guest (kvm101.0).
>>>>> >>>>>>>>>>>> There is no direct connection between the
>>>>> servers,
>>>> but
>>>>> as I
>>>>> >said
>>>>> >>>>>>>>>>>> earlier, the multicast traffic does reach the VMs
>>>> so I
>>>>> >assume
>>>>> >>>>>>> there
>>>>> >>>>>>>>> is
>>>>> >>>>>>>>>>>> no problem with that.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote:
>>>>> >>>>>>>>>>>>> First, you need to ensure that your switch
>>>>> (or all
>>>>> >switches in
>>>>> >>>>>>> the
>>>>> >>>>>>>>>>>>> path) have igmp snooping enabled on host
>>>>> ports (and
>>>>> >probably
>>>>> >>>>>>>>>>>>> interconnects along the path between your
>>>>> hosts).
>>>>> >>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>> Second, you need an igmp querier to be enabled
>>>> somewhere
>>>>> >near
>>>>> >>>>>>>>> (better
>>>>> >>>>>>>>>>>>> to have it enabled on a switch itself). Please
>>>> verify
>>>>> that
>>>>> >you
>>>>> >>>>>>> see
>>>>> >>>>>>>>>>>> its
>>>>> >>>>>>>>>>>>> queries on hosts.
>>>>> >>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>> Next, you probably need to make your hosts to
>>>>> use
>>>> IGMPv2
>>>>> >>>>>>>>>>>>> (not 3)
>>>>> >>>>>>>>> as
>>>>> >>>>>>>>>>>>> many switches still can not understand v3. This
>>>> is doable
>>>>> >by
>>>>> >>>>>>>>> sysctl,
>>>>> >>>>>>>>>>>>> find on internet, there are many articles.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> I have send an query to our Data center Techs
>>>>> who are
>>>>> >analyzing
>>>>> >>>>>>>>> this
>>>>> >>>>>>>>>>>> and
>>>>> >>>>>>>>>>>> were already on it analyzing if multicast
>>>>> Traffic is
>>>>> >somewhere
>>>>> >>>>>>>>> blocked
>>>>> >>>>>>>>>>>> or hindered. So far the answer is, "multicast ist
>>>>> explictly
>>>>> >>>>>>> allowed
>>>>> >>>>>>>>> in
>>>>> >>>>>>>>>>>> the local network and no packets are filtered or
>>>> dropped".
>>>>> >I am
>>>>> >>>>>>>>> still
>>>>> >>>>>>>>>>>> waiting for a final report though.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> In the meantime I have switched IGMPv3 to IGMPv2
>>>> on every
>>>>> >>>>>>> involved
>>>>> >>>>>>>>>>>> server, hosts and guests via the mentioned
>>>>> sysctl.
>>>> The
>>>>> >switching
>>>>> >>>>>>>>> itself
>>>>> >>>>>>>>>>>> was successful, according to "cat
>>>>> /proc/net/igmp" but
>>>>> sadly
>>>>> >did
>>>>> >>>>>>> not
>>>>> >>>>>>>>>>>> better the behavior. It actually led to that
>>>>> no VM
>>>>> received
>>>>> >the
>>>>> >>>>>>>>>>>> multicast traffic anymore too.
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> kind regards
>>>>> >>>>>>>>>>>> Stefan Schmitz
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>>
>>>>> >>>>>>>>>>>> Am 09.07.2020 um 22:34 schrieb Klaus Wenninger:
>>>>> >>>>>>>>>>>>> On 7/9/20 5:17 PM,
>>>> stefan.schmitz at farmpartner-tec.com
>>>>> <mailto:stefan.schmitz at farmpartner-tec.com>
>>>>> >wrote:
>>>>> >>>>>>>>>>>>>> Hello,
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>>> Well, theory still holds I would say.
>>>>> >>>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>>> I guess that the multicast-traffic from the
>>>> other host
>>>>> >>>>>>>>>>>>>>> or the guestsdoesn't get to the daemon on the
>>>> host.
>>>>> >>>>>>>>>>>>>>> Can't you just simply check if there are any
>>>> firewall
>>>>> >>>>>>>>>>>>>>> rules configuredon the host kernel?
>>>>> >>>>>>>>>>>>>> I hope I did understand you corretcly and
>>>>> you are
>>>>> >referring to
>>>>> >>>>>>>>>>>> iptables?
>>>>> >>>>>>>>>>>>> I didn't say iptables because it might have been
>>>>> >>>>>>>>>>>>> nftables - but yesthat is what I was
>>>>> referring to.
>>>>> >>>>>>>>>>>>> Guess to understand the config the output is
>>>>> >>>>>>>>>>>>> lacking verbositybut it makes me believe that
>>>>> >>>>>>>>>>>>> the whole setup doesn't lookas I would have
>>>>> >>>>>>>>>>>>> expected (bridges on each host where theguest
>>>>> >>>>>>>>>>>>> has a connection to and where ethernet
>>>>> interfaces
>>>>> >>>>>>>>>>>>> that connect the 2 hosts are part of as well -
>>>>> >>>>>>>>>>>>> everythingconnected via layer 2 basically).
>>>>> >>>>>>>>>>>>>> Here is the output of the current rules.
>>>>> Besides
>>>> the IP
>>>>> >of the
>>>>> >>>>>>>>> guest
>>>>> >>>>>>>>>>>>>> the output is identical on both hosts:
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>> # iptables -S
>>>>> >>>>>>>>>>>>>> -P INPUT ACCEPT
>>>>> >>>>>>>>>>>>>> -P FORWARD ACCEPT
>>>>> >>>>>>>>>>>>>> -P OUTPUT ACCEPT
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>> # iptables -L
>>>>> >>>>>>>>>>>>>> Chain INPUT (policy ACCEPT)
>>>>> >>>>>>>>>>>>>> target prot opt source
>>>> destination
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>> Chain FORWARD (policy ACCEPT)
>>>>> >>>>>>>>>>>>>> target prot opt source
>>>> destination
>>>>> >>>>>>>>>>>>>> SOLUSVM_TRAFFIC_IN all -- anywhere
>>>>> >anywhere
>>>>> >>>>>>>>>>>>>> SOLUSVM_TRAFFIC_OUT all -- anywhere
>>>>> >anywhere
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>> Chain OUTPUT (policy ACCEPT)
>>>>> >>>>>>>>>>>>>> target prot opt source
>>>> destination
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>> Chain SOLUSVM_TRAFFIC_IN (1 references)
>>>>> >>>>>>>>>>>>>> target prot opt source
>>>> destination
>>>>> >>>>>>>>>>>>>> all -- anywhere
>>>>> >192.168.1.14
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>> Chain SOLUSVM_TRAFFIC_OUT (1 references)
>>>>> >>>>>>>>>>>>>> target prot opt source
>>>> destination
>>>>> >>>>>>>>>>>>>> all -- 192.168.1.14 anywhere
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>> kind regards
>>>>> >>>>>>>>>>>>>> Stefan Schmitz
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>>>>>>>>>>
>>>>> >>>>>> _______________________________________________
>>>>> >>>>>> Manage your subscription:
>>>>> >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>> >>>>>>
>>>>> >>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> >> _______________________________________________
>>>>> >> Manage your subscription:
>>>>> >> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>> >>
>>>>> >> ClusterLabs home: https://www.clusterlabs.org/
>>>>> >_______________________________________________
>>>>> >Manage your subscription:
>>>>> >https://lists.clusterlabs.org/mailman/listinfo/users
>>>>> >
>>>>> >ClusterLabs home: https://www.clusterlabs.org/
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Reid Wahl, RHCA
>>>>> Software Maintenance Engineer, Red Hat
>>>>> CEE - Platform Support Delivery - ClusterHA
>>>>
>>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
More information about the Users
mailing list