<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.16.3">
</HEAD>
<BODY>
Hi Andrew,<BR>
<BR>
With configuration fumble, err, test, that brought about this "of chickens and eggs and VMs" request, the situation is that the reboot of the non-host server results in the restart of the VM running on the host server.<BR>
<BR>
>From earlier [Pacemaker] thread:<BR>
<BR>
<BLOCKQUOTE TYPE=CITE>
From: Tom Fernandes <<A HREF="mailto:anyaddress@gmx.net">anyaddress@gmx.net</A>><BR>
Subject: [Pacemaker] chicken-egg-problem with libvirtd and a VM within cluster<BR>
Date: Thu, 11 Oct 2012 18:09:30 +0200 (09:09 PDT)<BR>
...<BR>
I observed that when I stop and start corosync on one of the nodes, pacemaker <BR>
(when starting corosync again) wants to check the status of the vm before <BR>
starting libvirtd. This check fails as libvirtd needs to be running for this <BR>
check. After trying for 20s libvirtd starts. The vm gets restarted after those <BR>
20s and then runs on one of the nodes. I am left with a monitoring-error to <BR>
cleanup and my vm has rebooted.<BR>
</BLOCKQUOTE>
<BR>
And the same issue raised by myself earlier:<BR>
<BR>
<BLOCKQUOTE TYPE=CITE>
From: Bob Haxo <<A HREF="mailto:bhaxo@sgi.com">bhaxo@sgi.com</A>><BR>
Subject: [Pacemaker] GFS2 with Pacemaker on RHEL6.3 restarts with reboot<BR>
Date: Wed, 8 Aug 2012 19:14:31 -0700<BR>
...<BR>
<BR>
Problem: When the the non-VM-host is rebooted, then when Pacemaker<BR>
restarts the gfs2 filesystem gets restarted on the VM host, which causes<BR>
the stop and start of the VirtualDomain. The gfs2 filesystem also gets<BR>
restarted without of the VirtualDomain resource included. <BR>
</BLOCKQUOTE>
<BR>
The "chicken and egg and VMs" configured cluster is no longer available. Perhaps the output of "crm configure show" has been saved.<BR>
<BR>
Regarding the "chicken and egg and VMs" question, I now avoid the issue ... somehow, and have moved on to new issues.<BR>
<BR>
Please see the thread: [Pacemaker] "stonith_admin -F node" results in a pair of reboots. In particular the Tue, 7 Jan 2014 09:21:54 +0100 response from <FONT COLOR="#000000">Fabio</FONT> Di Nitto. <BR>
<BR>
The information from Fabio was very helpful. I currently seem to have arrived at a RHEL 6.5 HA virtual server solution: no "chicken and egg and VMs" problem, no "fencing of both servers when only one was explicitly fenced", no "clvmd startup timed out" resulting in "clvmd:pid blocked for more than 120 seconds", but with a working VM, a working live migration and a correct response to a manual fence cmd. Tomorrow I will add to that thread the results of my work today.<BR>
<BR>
Regards,<BR>
Bob Haxo<BR>
<BR>
On Wed, 2014-01-08 at 10:32 +1100, Andrew Beekhof wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000">On 20 Dec 2013, at 5:30 am, Bob Haxo <<A HREF="mailto:bhaxo@sgi.com">bhaxo@sgi.com</A>> wrote:</FONT>
<FONT COLOR="#000000">> Hello,</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Earlier emails related to this topic:</FONT>
<FONT COLOR="#000000">> [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster</FONT>
<FONT COLOR="#000000">> [pacemaker] VirtualDomain problem after reboot of one node</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> My configuration:</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> pacemaker-libs-1.1.10-14.el6_5.1.x86_64</FONT>
<FONT COLOR="#000000">> pacemaker-cli-1.1.10-14.el6_5.1.x86_64</FONT>
<FONT COLOR="#000000">> pacemaker-1.1.10-14.el6_5.1.x86_64</FONT>
<FONT COLOR="#000000">> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Two node HA VM cluster using real shared drive, not drbd.</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Resources (relevant to this discussion):</FONT>
<FONT COLOR="#000000">> primitive p_fs_images ocf:heartbeat:Filesystem \</FONT>
<FONT COLOR="#000000">> primitive p_libvirtd lsb:libvirtd \</FONT>
<FONT COLOR="#000000">> primitive virt ocf:heartbeat:VirtualDomain \</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> services chkconfig on: cman, clvmd, pacemaker</FONT>
<FONT COLOR="#000000">> services chkconfig off: corosync, gfs2, libvirtd</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Observation:</FONT>
<FONT COLOR="#000000">> </FONT>
</PRE>
<FONT COLOR="#000000">> Rebooting the NON-host system results in the restart of the VM merrily running on the host system.</FONT><BR>
<BR>
<FONT COLOR="#000000">I'm still bootstrapping after the break, but I'm not following this. Can you rephrase? </FONT>
<PRE>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Apparent cause:</FONT>
<FONT COLOR="#000000">> </FONT>
</PRE>
<FONT COLOR="#000000">> Upon startup, Pacemaker apparently checks the status of configured resources. However, the status request for the virt (ocf:heartbeat:VirtualDomain) resource fails with:</FONT>
<PRE>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Dec 18 12:19:30 [4147] mici-admin2 lrmd: warning: child_timeout_callback: virt_monitor_0 process (PID 4158) timed out</FONT>
<FONT COLOR="#000000">> Dec 18 12:19:30 [4147] mici-admin2 lrmd: warning: operation_finished: virt_monitor_0:4158 - timed out after 200000ms</FONT>
<FONT COLOR="#000000">> Dec 18 12:19:30 [4147] mici-admin2 lrmd: notice: operation_finished: virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor ]</FONT>
<FONT COLOR="#000000">> Dec 18 12:19:30 [4147] mici-admin2 lrmd: notice: operation_finished: virt_monitor_0:4158:stderr [ error: no valid connection ]</FONT>
<FONT COLOR="#000000">> Dec 18 12:19:30 [4147] mici-admin2 lrmd: notice: operation_finished: virt_monitor_0:4158:stderr [ error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory ]</FONT>
<FONT COLOR="#000000">Sounds like the agent should perhaps be returning OCF_NOT_RUNNING in this case.</FONT>
</PRE>
<FONT COLOR="#000000">> </FONT><BR>
<FONT COLOR="#000000">> </FONT><BR>
<FONT COLOR="#000000">> This failure then snowballs into an "orphan" situation in which the running VM is restarted.</FONT><BR>
<FONT COLOR="#000000">> </FONT><BR>
<FONT COLOR="#000000">> There was the suggestion of chkconfig on libvirtd (and presumably deleting the resource) so that the /var/run/libvirt/libvirt-sock has been created by service libvirtd. With libvirtd started by the system, there is no un-needed reboot of the VM.</FONT><BR>
<FONT COLOR="#000000">> </FONT><BR>
<FONT COLOR="#000000">> However, it may be that removing libvirtd from Pacemaker control leaves the VM vdisk filesystem susceptible to corruption during a reboot induced failover.</FONT><BR>
<FONT COLOR="#000000">> </FONT><BR>
<FONT COLOR="#000000">> Question:</FONT><BR>
<FONT COLOR="#000000">> </FONT><BR>
<FONT COLOR="#000000">> Is there an accepted Pacemaker configuration such that the un-needed restart of the VM does not occur with the reboot of the non-host system?</FONT>
<PRE>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Regards,</FONT>
<FONT COLOR="#000000">> Bob Haxo</FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> _______________________________________________</FONT>
<FONT COLOR="#000000">> Pacemaker mailing list: <A HREF="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A></FONT>
<FONT COLOR="#000000">> <A HREF="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A></FONT>
<FONT COLOR="#000000">> </FONT>
<FONT COLOR="#000000">> Project Home: <A HREF="http://www.clusterlabs.org">http://www.clusterlabs.org</A></FONT>
<FONT COLOR="#000000">> Getting started: <A HREF="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</A></FONT>
<FONT COLOR="#000000">> Bugs: <A HREF="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</A></FONT>
<FONT COLOR="#000000">_______________________________________________</FONT>
<FONT COLOR="#000000">Pacemaker mailing list: <A HREF="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</A></FONT>
<FONT COLOR="#000000"><A HREF="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</A></FONT>
<FONT COLOR="#000000">Project Home: <A HREF="http://www.clusterlabs.org">http://www.clusterlabs.org</A></FONT>
<FONT COLOR="#000000">Getting started: <A HREF="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</A></FONT>
<FONT COLOR="#000000">Bugs: <A HREF="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</A></FONT>
</PRE>
</BLOCKQUOTE>
</BODY>
</HTML>