<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Times New Roman; font-size: 12pt; color: #000000'>Hi Andrew,<div><br></div><div>Thanks, that sounds good. I am using the Ubuntu HA ppa, so I will wait for a 1.1.7 package to become available.</div><div><br></div><div>Andrew<br><br><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Andrew Beekhof" <andrew@beekhof.net><br><b>To: </b>"The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org><br><b>Sent: </b>Thursday, March 29, 2012 1:08:21 AM<br><b>Subject: </b>Re: [Pacemaker] VirtualDomain Shutdown Timeout<br><br>On Sun, Mar 25, 2012 at 6:27 AM, Andrew Martin <amartin@xes-inc.com> wrote:<br>> Hello,<br>><br>> I have configured a KVM virtual machine primitive using Pacemaker 1.1.6 and<br>> Heartbeat 3.0.5 on Ubuntu 10.04 Server using DRBD as the storage device (so<br>> there is no shared storage, no live-migration):<br>> primitive p_vm ocf:heartbeat:VirtualDomain \<br>>         params config="/vmstore/config/vm.xml" \<br>>         meta allow-migrate="false" \<br>>         op start interval="0" timeout="180s" \<br>>         op stop interval="0" timeout="120s" \<br>>         op monitor interval="10" timeout="30"<br>><br>> I would expect the following events to happen on failover on the "from" node<br>> (the migration source) if the VM hangs while shutting down:<br>> 1. VirtualDomain issues "virsh shutdown vm" to gracefully shutdown the VM<br>> 2. pacemaker waits 120 seconds for the timeout specified in the "op stop"<br>> timeout<br>> 3. VirtualDomain waits a bit less than 120 seconds to see if it will<br>> gracefully shutdown. Once it gets to almost 120 seconds, it issues "virsh<br>> destroy vm" to hard stop the VM.<br>> 4. pacemaker wakes up from the 120 second timeout and sees that the VM has<br>> stopped and proceeds with the failover<br>><br>> However, I observed that VirtualDomain seems to be using the timeout from<br>> the "op start" line, 180 seconds, yet pacemaker uses the 120 second timeout.<br>> Thus, the VM is still running after the pacemaker timeout is reached and so<br>> the node is STONITHed. Here is the relevant section of code from<br>> /usr/lib/ocf/resource.d/heartbeat/VirtualDomain:<br>> VirtualDomain_Stop() {<br>>     local i<br>>     local status<br>>     local shutdown_timeout<br>>     local out ex<br>><br>>     VirtualDomain_Status<br>>     status=$?<br>><br>>     case $status in<br>>         $OCF_SUCCESS)<br>>             if ! ocf_is_true $OCF_RESKEY_force_stop; then<br>>                 # Issue a graceful shutdown request<br>>                 ocf_log info "Issuing graceful shutdown request for domain<br>> ${DOMAIN_NAME}."<br>>                 virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}<br>>                 # The "shutdown_timeout" we use here is the operation<br>>                 # timeout specified in the CIB, minus 5 seconds<br>>                 shutdown_timeout=$(( $NOW +<br>> ($OCF_RESKEY_CRM_meta_timeout/1000) -5 ))<br>>                 # Loop on status until we reach $shutdown_timeout<br>>                 while [ $NOW -lt $shutdown_timeout ]; do<br>><br>> Doesn't $OCF_RESKEY_CRM_meta_timeout correspond to the timeout value in the<br>> "op stop ..." line?<br><br>It should, however there was a bug in 1.1.6 where this wasn't the case.<br>The relevant patch is:<br>  https://github.com/beekhof/pacemaker/commit/fcfe6fe<br><br>Or you could try 1.1.7<br><br>><br>> How can I optimize my pacemaker configuration so that the VM will attempt to<br>> gracefully shutdown and then at worst case destroy the VM before the<br>> pacemaker timeout is reached? Moreover, is there anything I can do inside of<br>> the VM (another Ubuntu 10.04 install) to optimize/speed up the shutdown<br>> process?<br>><br>> Thanks,<br>><br>> Andrew<br>><br>><br>> _______________________________________________<br>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>><br>> Project Home: http://www.clusterlabs.org<br>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>> Bugs: http://bugs.clusterlabs.org<br>><br><br>_______________________________________________<br>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br><br>Project Home: http://www.clusterlabs.org<br>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>Bugs: http://bugs.clusterlabs.org<br></div><br></div></div></body></html>