<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hello,<br>
<br>
Is this the right place to report this issue? (please redirect me if
not)<br>
<br>
As we were experiencing/demonstrating our new cluster yesterday, we
stumbled on a caveat in our LibvirtQemu resource agent (derived from
VirtualDomain). Since the caveat is the same in the VirtualDomain
resource agent; I thought I better report it. Please see the patch
below (for LibvirtQemu), which comments should allow you to
understand where the problem lies.<br>
<br>
--- LibvirtQemu.orig 2014-08-22 09:39:21.997201000 +0200<br>
+++ LibvirtQemu 2014-08-22 09:50:32.440969000 +0200<br>
@@ -154,11 +154,10 @@<br>
local virsh_output<br>
local domain_name<br>
<br>
- # Note: passing in the domain name from outside the script is<br>
- # intended for testing and debugging purposes only. Don't do this<br>
- # in production, instead let the script figure out the domain
name<br>
- # from the config file. You have been warned.<br>
- if [ -z "${DOMAIN_NAME}" ]; then<br>
+ # NOTE: Re-defining an already defined domain is dangerous! It
shall be done only<br>
+ # if we can reasonably assume the configuration file hasn't
changed since the last<br>
+ # time the domain has been defined.<br>
+ if [ -z "${DOMAIN_NAME}" ] || [ "${OCF_RESKEY_config}" -ot
"${STATEFILE}" ]; then<br>
# Spin until we have a domain name<br>
while true; do<br>
virsh_output="$(virsh ${VIRSH_OPTIONS} define
${OCF_RESKEY_config})"<br>
@@ -170,7 +169,7 @@<br>
echo "${domain_name}" > "${STATEFILE}"<br>
ocf_log info "Domain name '${domain_name}' saved to state file
'${STATEFILE}'."<br>
else<br>
- ocf_log warn "Domain name '${DOMAIN_NAME}' already defined;
overriding configuration file '${OCF_RESKEY_config}' (this should
NOT ne done in production!)."<br>
+ ocf_log warn "Domain name '${DOMAIN_NAME}' already defined;
overriding by newer configuration file will NOT be done!"<br>
fi<br>
}<br>
<br>
@@ -205,12 +204,12 @@<br>
;;<br>
''|'no state')<br>
# Empty string may be returned when virsh does not<br>
- # receive a reply from libvirtd.<br>
+ # receive a reply from libvirtd or after the domain has<br>
+ # been undefined.<br>
# "no state" may occur when the domain is currently<br>
# being migrated (on the migration target only), or<br>
# whenever virsh can't reliably obtain the domain<br>
# state.<br>
- status='no state'<br>
if [ "${__OCF_ACTION}" == 'stop' ] && [ ${try} -ge
3 ]; then<br>
# During the stop operation, we want to bail out<br>
# quickly, so as to be able to force-stop (destroy)<br>
@@ -224,6 +223,17 @@<br>
ocf_log info "Domain '${DOMAIN_NAME}' currently has no
state; retrying."<br>
sleep 1<br>
fi<br>
+ if [ "${status}" == '' ] && [ $(( ${try} % 10 ))
-eq 0 ]; then<br>
+ # Could it be that libvirtd is running healthily but the
domain<br>
+ # has been undefined? In that case, let's attempt to
re-define it.<br>
+ # If libvirtd IS running, it can not hurt (given the
safeguards in<br>
+ # LibvirtQemu_Define). If libvirtd is NOT running, then
something is<br>
+ # definitely wrong (and the monitor operation will
time-out in<br>
+ # LibvirtQemu_Define the same way as it would here).<br>
+ ocf_log warn "Has domain '${DOMAIN_NAME}' been undefined?
attempting to re-define it."<br>
+ LibvirtQemu_Define<br>
+ fi<br>
+ status='no state'<br>
;;<br>
*)<br>
# any other output is unexpected.<br>
@@ -487,6 +497,11 @@<br>
<br>
# Define the domain on startup, and re-define whenever someone
deleted<br>
# the state file, or touched the config.<br>
+# WARNING: There is a caveat here! When the resource is stopped,
the state file<br>
+# is deleted ONLY on the node where it was running. In case the
domain is then<br>
+# undefined (from libvirtd), on all nodes, we will end-up with a
state file but no<br>
+# domain definition on those nodes that were not running the
resource. The monitor<br>
+# operation MUST handle that situation, should the resource be
restarted.<br>
if [ ! -e "${STATEFILE}" ] || [ "${OCF_RESKEY_config}" -nt
"${STATEFILE}" ]; then<br>
LibvirtQemu_Define<br>
fi<br>
<br>
One could ask "why undefine a libvirt domain and then restart it?".
The answer is two-fold: 1. experience showed us that we shall
undefine a decommissioned domain from libvirt to prevent potential
UUID conflict when defining a new domain (which is likely in our
setup, since UUID are build from the domain IP address); 2. the
"demo-effect" (or potential legitimate reasons), where one would
"decommission" a domain and restart it right afterwards ( :-/ ).<br>
<br>
PS: we now also make sure to delete the VirtualDomain/LibvirtQemu
state file when undefining the domain. But best have multiple safe
guards as far as this caveat is concerned (thus the patch above).<br>
<br>
Hope it helps,<br>
<br>
Cédric<br>
<br>
<div class="moz-signature">-- <br>
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<style type="text/css">
DIV.signature {FONT:normal 11px sans-serif;COLOR:#000000;}
DIV.signature P {MARGIN:5px 0px;FONT:bold 13px sans-serif;COLOR:#000050;}
</style>
<div class="signature">
<p>Cédric Dufour @ Idiap Research Institute</p>
</div>
</div>
</body>
</html>