[ClusterLabs] Antw: Expected recovery behavior of remote-node guest when corosync ring0 is lost in a passive mode RRP config?

Scott Greenlese swgreenl at us.ibm.com
Tue Mar 7 11:28:57 EST 2017


Ulrich,

Thank you very much for your feedback.

You wrote, "Could it be you forgot "allow-migrate=true" at the resource
level or some migration IP address at the node level?
I only have SLES11 here..."

I know for sure that the pacemaker remote node (zs95kjg110102) I mentioned
below is configured correctly for pacemaker Live Guest Migration.
I can demonstrate this using the 'pcs resource move' CLI :

I will migrate this "remote node" guest (zs95kjg110102) and resource
"zs95kjg110102_res" to another cluster node
(e.g. zs95kjpcs1 / 10.20.93.12) , using the 'pcs1' hostname / IP which is
currently running on zs93kjpcs1 (10.20.93.11):

[root at zs95kj ~]# pcs resource show |grep zs95kjg110102_res
 zs95kjg110102_res      (ocf::heartbeat:VirtualDomain): Started zs93kjpcs1

[root at zs93kj ~]# pcs resource move zs95kjg110102_res zs95kjpcs1

[root at zs93kj ~]# pcs resource show |grep zs95kjg110102_res
 zs95kjg110102_res      (ocf::heartbeat:VirtualDomain): Started zs95kjpcs1

## On zs95kjpcs1,  you can see that the guest is actually running there...

[root at zs95kj ~]# virsh list |grep zs95kjg110102
 63    zs95kjg110102                  running

[root at zs95kj ~]# ping 10.20.110.102
PING 10.20.110.102 (10.20.110.102) 56(84) bytes of data.
64 bytes from 10.20.110.102: icmp_seq=1 ttl=63 time=0.775 ms

So, everything seems set up correctly for live guest migration of this
VirtualDomain resource.

What I am really looking for is a way to ensure 100% availability of a
"live guest migratable" pacemaker remote node guest
in a situation where the interface (in this case vlan1293) ring0_addr goes
down.  I thought that maybe configuring
Redundant Ring Protocol (RRP) for corosync would provide this, but from
what I've seen so far it doesn't
look that way.    If the ring0_addr interface is lost in an RRP
configuration while the remote guest is connected
to the host using that ring0_addr, the guest gets rebooted and
reestablishes the  "remote-node-to-host" connection over the ring1_addr,
which is great as long as you don't care if the guest gets rebooted.
Corosync is doing its job of preventing the
cluster node from being fenced by failing over its heartbeat messaging to
ring1, however the remote_node guests take
a short term hit due to the remote-node-to-host reconnect.

In the event of a ring0_addr failure, I don't see any attempt by pacemaker
to migrate the remote_node to another cluster node,
but maybe this is by design, since there is no alternate path for the guest
to use for LGM (i.e. ring0 is a single point of failure).
If the guest could be migrated over an alternate route, it would prevent
the guest outage.

Maybe my question is... is there any way to facilitate an alternate Live
Guest Migration path in the event of a ring0_addr failure?
This might also apply to a single ring protocol as well.

Thanks,

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
  INTERNET:  swgreenl at us.ibm.com




From:	"Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de>
To:	<users at clusterlabs.org>
Date:	03/02/2017 02:39 AM
Subject:	[ClusterLabs] Antw: Expected recovery behavior of remote-node
            guest when corosync ring0 is lost in a passive mode RRP config?



>>> "Scott Greenlese" <swgreenl at us.ibm.com> schrieb am 01.03.2017 um 22:07
in
Nachricht
<OFFC50C6DC.1138528D-ON002580D6.006F49AA-852580D6.00740858 at notes.na.collabserv.c

m>:

> Hi..
>
> I am running a few corosync "passive mode" Redundant Ring Protocol (RRP)
> failure scenarios, where
> my cluster has several remote-node VirtualDomain resources running on
each
> node in the cluster,
> which have been configured to allow Live Guest Migration (LGM)
operations.
>
> While both corosync rings are active, if I drop ring0 on a given node
where
> I have remote node (guests) running,
> I noticed that the guest will be shutdown / re-started on the same host,
> after which the connection is re-established
> and the guest proceeds to run on that same cluster node.

Could it be you forgot "allow-migrate=true" at the resource level or some
migration IP address at the node level?
I only have SLES11 here...

>
> I am wondering why pacemaker doesn't try to "live" migrate the remote
node
> (guest) to a different node, instead
> of rebooting the guest?  Is there some way to configure the remote nodes
> such that the recovery action is
> LGM instead of reboot when the host-to-remote_node connect is lost in an
> RRP situation?   I guess the
> next question is, is it even possible to LGM a remote node guest if the
> corosync ring fails over from ring0 to ring1
> (or vise-versa)?
>
> # For example, here's a remote node's VirtualDomain resource definition.
>
> [root at zs95kj]# pcs resource show  zs95kjg110102_res
>  Resource: zs95kjg110102_res (class=ocf provider=heartbeat
> type=VirtualDomain)
>   Attributes: config=/guestxml/nfs1/zs95kjg110102.xml
> hypervisor=qemu:///system migration_transport=ssh
>   Meta Attrs: allow-migrate=true remote-node=zs95kjg110102
> remote-addr=10.20.110.102
>   Operations: start interval=0s timeout=480
> (zs95kjg110102_res-start-interval-0s)
>               stop interval=0s timeout=120
> (zs95kjg110102_res-stop-interval-0s)
>               monitor interval=30s
(zs95kjg110102_res-monitor-interval-30s)
>               migrate-from interval=0s timeout=1200
> (zs95kjg110102_res-migrate-from-interval-0s)
>               migrate-to interval=0s timeout=1200
> (zs95kjg110102_res-migrate-to-interval-0s)
> [root at zs95kj VD]#
>
>
>
>
> # My RRP rings are active, and configured "rrp_mode="passive"
>
> [root at zs95kj ~]# corosync-cfgtool -s
> Printing ring status.
> Local node ID 2
> RING ID 0
>         id      = 10.20.93.12
>         status  = ring 0 active with no faults
> RING ID 1
>         id      = 10.20.94.212
>         status  = ring 1 active with no faults
>
>
>
> # Here's the corosync.conf ..
>
> [root at zs95kj ~]# cat /etc/corosync/corosync.conf
> totem {
>     version: 2
>     secauth: off
>     cluster_name: test_cluster_2
>     transport: udpu
>     rrp_mode: passive
> }
>
> nodelist {
>     node {
>         ring0_addr: zs95kjpcs1
>         ring1_addr: zs95kjpcs2
>         nodeid: 2
>     }
>
>     node {
>         ring0_addr: zs95KLpcs1
>         ring1_addr: zs95KLpcs2
>         nodeid: 3
>     }
>
>     node {
>         ring0_addr: zs90kppcs1
>         ring1_addr: zs90kppcs2
>         nodeid: 4
>     }
>
>     node {
>         ring0_addr: zs93KLpcs1
>         ring1_addr: zs93KLpcs2
>         nodeid: 5
>     }
>
>     node {
>         ring0_addr: zs93kjpcs1
>         ring1_addr: zs93kjpcs2
>         nodeid: 1
>     }
> }
>
> quorum {
>     provider: corosync_votequorum
> }
>
> logging {
>     to_logfile: yes
>     logfile: /var/log/corosync/corosync.log
>     timestamp: on
>     syslog_facility: daemon
>     to_syslog: yes
>     debug: on
>
>     logger_subsys {
>         debug: off
>         subsys: QUORUM
>     }
> }
>
>
>
>
> # Here's the vlan / route situation on cluster node zs95kj:
>
> ring0 is on vlan1293
> ring1 is on vlan1294
>
> [root at zs95kj ~]# route -n
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use
> Iface
> 0.0.0.0         10.20.93.254    0.0.0.0         UG    400    0        0
> vlan1293  << default route to guests from ring0
> 9.0.0.0         9.12.23.1       255.0.0.0       UG    400    0        0
> vlan508
> 9.12.23.0       0.0.0.0         255.255.255.0   U     400    0        0
> vlan508
> 10.20.92.0      0.0.0.0         255.255.255.0   U     400    0        0
> vlan1292
> 10.20.93.0      0.0.0.0         255.255.255.0   U     0      0        0
> vlan1293  << ring0 IPs
> 10.20.93.0      0.0.0.0         255.255.255.0   U     400    0        0
> vlan1293
> 10.20.94.0      0.0.0.0         255.255.255.0   U     0      0        0
> vlan1294   << ring1 IPs
> 10.20.94.0      0.0.0.0         255.255.255.0   U     400    0        0
> vlan1294
> 10.20.101.0     0.0.0.0         255.255.255.0   U     400    0        0
> vlan1298
> 10.20.109.0     10.20.94.254    255.255.255.0   UG    400    0        0
> vlan1294  << Route to guests on 10.20.109 from ring1
> 10.20.110.0     10.20.94.254    255.255.255.0   UG    400    0        0
> vlan1294  << Route to guests on 10.20.110 from ring1
> 169.254.0.0     0.0.0.0         255.255.0.0     U     1007   0        0
> enccw0.0.02e0
> 169.254.0.0     0.0.0.0         255.255.0.0     U     1016   0        0
> ovsbridge1
> 192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0
> virbr0
>
>
>
> # On remote node, you can see we have a connection back to the host.
>
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> crm_log_init:  Changed active directory to /var/lib/heartbeat/cores/root
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: lrmd
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:   notice:
> lrmd_init_remote_tls_server:   Starting a tls listener on port 3121.
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:   notice:
> bind_and_listen:       Listening on address ::
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: cib_ro
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: cib_rw
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: cib_shm
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: attrd
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: stonith-ng
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: crmd
> Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted:     info: main:
> Starting
> Feb 28 14:30:27 [928] zs95kjg110102 pacemaker_remoted:   notice:
> lrmd_remote_listen:    LRMD client connection established. 0x9ec18b50 id:
> 93e25ef0-4ff8-45ac-a6ed-f13b64588326
>
> zs95kjg110102:~ # netstat -anp
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> PID/Program name
> tcp        0      0 0.0.0.0:22              0.0.0.0:*
LISTEN
> 946/sshd
> tcp        0      0 127.0.0.1:25            0.0.0.0:*
LISTEN
> 1022/master
> tcp        0      0 0.0.0.0:5666            0.0.0.0:*
LISTEN
> 931/xinetd
> tcp        0      0 0.0.0.0:5801            0.0.0.0:*
LISTEN
> 931/xinetd
> tcp        0      0 0.0.0.0:5901            0.0.0.0:*
LISTEN
> 931/xinetd
> tcp        0      0 :::21                   :::*
LISTEN
> 926/vsftpd
> tcp        0      0 :::22                   :::*
LISTEN
> 946/sshd
> tcp        0      0 ::1:25                  :::*
LISTEN
> 1022/master
> tcp        0      0 :::44931                :::*
LISTEN
> 1068/xdm
> tcp        0      0 :::80                   :::*
LISTEN
> 929/httpd-prefork
> tcp        0      0 :::3121                 :::*
LISTEN
> 928/pacemaker_remot
> tcp        0      0 10.20.110.102:3121      10.20.93.12:46425
> ESTABLISHED 928/pacemaker_remot
> udp        0      0 :::177                  :::*
> 1068/xdm
>
>
>
>
> ## Drop the ring0 (vlan1293) interface on cluster node zs95kj, causing
fail
> over to ring1 (vlan1294)
>
> [root at zs95kj]# date;ifdown vlan1293
> Tue Feb 28 15:54:11 EST 2017
> Device 'vlan1293' successfully disconnected.
>
>
>
> ## Confirm that ring0 is now offline (a.k.a. "FAULTY")
>
> [root at zs95kj]# date;corosync-cfgtool -s
> Tue Feb 28 15:54:49 EST 2017
> Printing ring status.
> Local node ID 2
> RING ID 0
>         id      = 10.20.93.12
>         status  = Marking ringid 0 interface 10.20.93.12 FAULTY
> RING ID 1
>         id      = 10.20.94.212
>         status  = ring 1 active with no faults
> [root at zs95kj VD]#
>
>
>
>
> # See that the resource stayed local to cluster node zs95kj.
>
> [root at zs95kj]# date;pcs resource show |grep zs95kjg110102
> Tue Feb 28 15:55:32 EST 2017
>  zs95kjg110102_res      (ocf::heartbeat:VirtualDomain): Started
zs95kjpcs1
> You have new mail in /var/spool/mail/root
>
>
>
> # On the remote node, show new entries in pacemaker.log showing
connection
> re-established.
>
> Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted:   notice:
> crm_signal_dispatch:   Invoking handler for signal 15: Terminated
> Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted:     info:
> lrmd_shutdown: Terminating with  1 clients
> Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_withdraw:   withdrawing server sockets
> Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted:     info:
> crm_xml_cleanup:       Cleaning up memory from libxml2
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> crm_log_init:  Changed active directory to /var/lib/heartbeat/cores/root
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: lrmd
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:   notice:
> lrmd_init_remote_tls_server:   Starting a tls listener on port 3121.
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:   notice:
> bind_and_listen:       Listening on address ::
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: cib_ro
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: cib_rw
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: cib_shm
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: attrd
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: stonith-ng
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info:
> qb_ipcs_us_publish:    server name: crmd
> Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted:     info: main:
> Starting
> Feb 28 15:55:38 [942] zs95kjg110102 pacemaker_remoted:   notice:
> lrmd_remote_listen:    LRMD client connection established. 0xbed1ab50 id:
> b19ed532-6f61-4d9c-9439-ffb836eea34f
> zs95kjg110102:~ #
>
>
>
> zs95kjg110102:~ # netstat -anp |less
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> PID/Program name
> tcp        0      0 0.0.0.0:22              0.0.0.0:*
LISTEN
> 961/sshd
> tcp        0      0 127.0.0.1:25            0.0.0.0:*
LISTEN
> 1065/master
> tcp        0      0 0.0.0.0:5666            0.0.0.0:*
LISTEN
> 946/xinetd
> tcp        0      0 0.0.0.0:5801            0.0.0.0:*
LISTEN
> 946/xinetd
> tcp        0      0 0.0.0.0:5901            0.0.0.0:*
LISTEN
> 946/xinetd
> tcp        0      0 10.20.110.102:22        10.20.94.32:57749
> ESTABLISHED 1134/0
> tcp        0      0 :::21                   :::*
LISTEN
> 941/vsftpd
> tcp        0      0 :::22                   :::*
LISTEN
> 961/sshd
> tcp        0      0 ::1:25                  :::*
LISTEN
> 1065/master
> tcp        0      0 :::80                   :::*
LISTEN
> 944/httpd-prefork
> tcp        0      0 :::3121                 :::*
LISTEN
> 942/pacemaker_remot
> tcp        0      0 :::34836                :::*
LISTEN
> 1070/xdm
> tcp        0      0 10.20.110.102:3121      10.20.94.212:49666
> ESTABLISHED 942/pacemaker_remot
> udp        0      0 :::177                  :::*
> 1070/xdm
>
>
>
> ## On host node, zs95kj show system messages indicating remote node
(guest)
> shutdown / start ...  (but no attempt to LGM).
>
> [root at zs95kj ~]# grep "Feb 28" /var/log/messages |grep zs95kjg110102
>
> Feb 28 15:55:07 zs95kj crmd[121380]:   error: Operation
> zs95kjg110102_monitor_30000: Timed Out (node=zs95kjpcs1, call=2,
> timeout=30000ms)
> Feb 28 15:55:07 zs95kj crmd[121380]:   error: Unexpected disconnect on
> remote-node zs95kjg110102
> Feb 28 15:55:17 zs95kj crmd[121380]:  notice: Operation
> zs95kjg110102_stop_0: ok (node=zs95kjpcs1, call=38, rc=0, cib-update=370,
> confirmed=true)
> Feb 28 15:55:17 zs95kj attrd[121378]:  notice: Removing all zs95kjg110102
> attributes for zs95kjpcs1
> Feb 28 15:55:17 zs95kj VirtualDomain(zs95kjg110102_res)[173127]: INFO:
> Issuing graceful shutdown request for domain zs95kjg110102.
> Feb 28 15:55:23 zs95kj systemd-machined: Machine qemu-38-zs95kjg110102
> terminated.
> Feb 28 15:55:23 zs95kj crmd[121380]:  notice: Operation
> zs95kjg110102_res_stop_0: ok (node=zs95kjpcs1, call=858, rc=0,
> cib-update=378, confirmed=true)
> Feb 28 15:55:24 zs95kj systemd-machined: New machine
qemu-64-zs95kjg110102.
> Feb 28 15:55:24 zs95kj systemd: Started Virtual Machine
> qemu-64-zs95kjg110102.
> Feb 28 15:55:24 zs95kj systemd: Starting Virtual Machine
> qemu-64-zs95kjg110102.
> Feb 28 15:55:25 zs95kj crmd[121380]:  notice: Operation
> zs95kjg110102_res_start_0: ok (node=zs95kjpcs1, call=859, rc=0,
> cib-update=385, confirmed=true)
> Feb 28 15:55:38 zs95kj crmd[121380]:  notice: Operation
> zs95kjg110102_start_0: ok (node=zs95kjpcs1, call=44, rc=0,
cib-update=387,
> confirmed=true)
> [root at zs95kj ~]#
>
>
> Once the remote node established re-connection, there was no further
remote
> node / resource instability.
>
> Anyway, just wondering why there was no attempt to migrate this remote
node
> guest as opposed to a reboot?   Is it necessary to reboot the guest in
> order to be managed
> by pacemaker and corosync over the ring1 interface if ring0 goes down?
> Is live guest migration even possible if ring0 goes away and ring1 takes
> over?
>
> Thanks in advance..
>
> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
> N.Y.
>   INTERNET:  swgreenl at us.ibm.com




_______________________________________________
Users mailing list: Users at clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170307/0bb53c44/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170307/0bb53c44/attachment-0003.gif>


More information about the Users mailing list