<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">

<HTML>

<HEAD>

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

  <META NAME="GENERATOR" CONTENT="GtkHTML/3.16.3">

</HEAD>

<BODY>

Hello,<BR>

<BR>

Earlier emails related to this topic:<BR>

[pacemaker] chicken-egg-problem with libvirtd and a VM within cluster<BR>

<FONT COLOR="#000000">[pacemaker] VirtualDomain problem after reboot of one node</FONT><BR>

<BR>

<BR>

My configuration:<BR>

<BR>

RHEL6.5/CMAN/gfs2/Pacemaker/crmsh<BR>

<BR>

pacemaker-libs-1.1.10-14.el6_5.1.x86_64<BR>

pacemaker-cli-1.1.10-14.el6_5.1.x86_64<BR>

pacemaker-1.1.10-14.el6_5.1.x86_64<BR>

pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64<BR>

<BR>

Two node HA VM cluster using real shared drive, not drbd.<BR>

<BR>

Resources (relevant to this discussion):<BR>

primitive p_fs_images ocf:heartbeat:Filesystem \<BR>

primitive p_libvirtd lsb:libvirtd \<BR>

primitive virt ocf:heartbeat:VirtualDomain \<BR>

<BR>

services chkconfig on: cman, clvmd, pacemaker<BR>

services chkconfig off: corosync, gfs2, libvirtd<BR>

<BR>

Observation:<BR>

<BR>

Rebooting the NON-host system results in the restart of the VM merrily running on the host system.<BR>

<BR>

Apparent cause:<BR>

<BR>

Upon startup, Pacemaker apparently checks the status of configured resources. However, the status request for the virt (ocf:heartbeat:VirtualDomain) resource fails with:<BR>

<BR>

<PRE>

Dec 18 12:19:30 [4147] mici-admin2       lrmd:  warning: child_timeout_callback:        virt_monitor_0 process (PID 4158) timed out

Dec 18 12:19:30 [4147] mici-admin2       lrmd:  warning: operation_finished:    virt_monitor_0:4158 - timed out after 200000ms

Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:    virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor ]

Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:    virt_monitor_0:4158:stderr [ error: no valid connection ]

Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:    virt_monitor_0:4158:stderr [ error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory ]


</PRE>

This failure then snowballs into an "orphan" situation in which the running VM is restarted.<BR>

<BR>

There was the suggestion of chkconfig on libvirtd (and presumably deleting the resource) so that the /var/run/libvirt/libvirt-sock has been created by service libvirtd. With libvirtd started by the system, there is no un-needed reboot of the VM.<BR>

<BR>

However, it may be that removing libvirtd from Pacemaker control leaves the VM vdisk filesystem susceptible to corruption during a reboot induced failover.<BR>

<BR>

Question:<BR>

<BR>

Is there an accepted Pacemaker configuration such that the un-needed restart of the VM does not occur with the reboot of the non-host system?<BR>

<BR>

Regards,<BR>

Bob Haxo<BR>

<BR>

<BR>

<PRE>


</PRE>

<BR>

<BR>

<BR>

<BR>

</BODY>

</HTML>