[Pacemaker] VirtualDomain Shutdown Timeout

Andrew Martin amartin at xes-inc.com
Thu Mar 29 09:25:51 EDT 2012


Hi Andrew, 


Thanks, that sounds good. I am using the Ubuntu HA ppa, so I will wait for a 1.1.7 package to become available. 


Andrew 

----- Original Message -----

From: "Andrew Beekhof" <andrew at beekhof.net> 
To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org> 
Sent: Thursday, March 29, 2012 1:08:21 AM 
Subject: Re: [Pacemaker] VirtualDomain Shutdown Timeout 

On Sun, Mar 25, 2012 at 6:27 AM, Andrew Martin <amartin at xes-inc.com> wrote: 
> Hello, 
> 
> I have configured a KVM virtual machine primitive using Pacemaker 1.1.6 and 
> Heartbeat 3.0.5 on Ubuntu 10.04 Server using DRBD as the storage device (so 
> there is no shared storage, no live-migration): 
> primitive p_vm ocf:heartbeat:VirtualDomain \ 
> params config="/vmstore/config/vm.xml" \ 
> meta allow-migrate="false" \ 
> op start interval="0" timeout="180s" \ 
> op stop interval="0" timeout="120s" \ 
> op monitor interval="10" timeout="30" 
> 
> I would expect the following events to happen on failover on the "from" node 
> (the migration source) if the VM hangs while shutting down: 
> 1. VirtualDomain issues "virsh shutdown vm" to gracefully shutdown the VM 
> 2. pacemaker waits 120 seconds for the timeout specified in the "op stop" 
> timeout 
> 3. VirtualDomain waits a bit less than 120 seconds to see if it will 
> gracefully shutdown. Once it gets to almost 120 seconds, it issues "virsh 
> destroy vm" to hard stop the VM. 
> 4. pacemaker wakes up from the 120 second timeout and sees that the VM has 
> stopped and proceeds with the failover 
> 
> However, I observed that VirtualDomain seems to be using the timeout from 
> the "op start" line, 180 seconds, yet pacemaker uses the 120 second timeout. 
> Thus, the VM is still running after the pacemaker timeout is reached and so 
> the node is STONITHed. Here is the relevant section of code from 
> /usr/lib/ocf/resource.d/heartbeat/VirtualDomain: 
> VirtualDomain_Stop() { 
> local i 
> local status 
> local shutdown_timeout 
> local out ex 
> 
> VirtualDomain_Status 
> status=$? 
> 
> case $status in 
> $OCF_SUCCESS) 
> if ! ocf_is_true $OCF_RESKEY_force_stop; then 
> # Issue a graceful shutdown request 
> ocf_log info "Issuing graceful shutdown request for domain 
> ${DOMAIN_NAME}." 
> virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME} 
> # The "shutdown_timeout" we use here is the operation 
> # timeout specified in the CIB, minus 5 seconds 
> shutdown_timeout=$(( $NOW + 
> ($OCF_RESKEY_CRM_meta_timeout/1000) -5 )) 
> # Loop on status until we reach $shutdown_timeout 
> while [ $NOW -lt $shutdown_timeout ]; do 
> 
> Doesn't $OCF_RESKEY_CRM_meta_timeout correspond to the timeout value in the 
> "op stop ..." line? 

It should, however there was a bug in 1.1.6 where this wasn't the case. 
The relevant patch is: 
https://github.com/beekhof/pacemaker/commit/fcfe6fe 

Or you could try 1.1.7 

> 
> How can I optimize my pacemaker configuration so that the VM will attempt to 
> gracefully shutdown and then at worst case destroy the VM before the 
> pacemaker timeout is reached? Moreover, is there anything I can do inside of 
> the VM (another Ubuntu 10.04 install) to optimize/speed up the shutdown 
> process? 
> 
> Thanks, 
> 
> Andrew 
> 
> 
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 
> 

_______________________________________________ 
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120329/2a293078/attachment-0003.html>


More information about the Pacemaker mailing list