<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Times New Roman; font-size: 12pt; color: #000000'><font size="3">Hello,</font><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><br></div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">I have configured a KVM virtual machine primitive using Pacemaker 1.1.6 and Heartbeat 3.0.5 on Ubuntu 10.04 Server using DRBD as the storage device (so there is no shared storage, no live-migration):</div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><div>primitive p_vm ocf:heartbeat:VirtualDomain \</div><div> params config="/vmstore/config/vm.xml" \</div><div> meta allow-migrate="false" \</div><div> op start interval="0" timeout="180s" \</div><div> op stop interval="0" timeout="120s" \</div><div> op monitor interval="10" timeout="30"</div></div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><br></div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">I would expect the following events to happen on failover on the "from" node (the migration source) if the VM hangs while shutting down:</div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">1. VirtualDomain issues "virsh shutdown vm" to gracefully shutdown the VM</div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">2. pacemaker waits 120 seconds for the timeout specified in the "op stop" timeout</div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">3. VirtualDomain waits a bit less than 120 seconds to see if it will gracefully shutdown. Once it gets to almost 120 seconds, it issues "virsh destroy vm" to hard stop the VM.</div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">4. pacemaker wakes up from the 120 second timeout and sees that the VM has stopped and proceeds with the failover</div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><br></div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">However, I observed that VirtualDomain seems to be using the timeout from the "op start" line, 180 seconds, yet pacemaker uses the 120 second timeout. Thus, the VM is still running after the pacemaker timeout is reached and so the node is STONITHed. Here is the relevant section of code from /usr/lib/ocf/resource.d/heartbeat/VirtualDomain:</div><div>VirtualDomain_Stop() {</div><div> local i</div><div> local status</div><div> local shutdown_timeout</div><div> local out ex</div><div><br></div><div> VirtualDomain_Status</div><div> status=$?</div><div><br></div><div> case $status in</div><div> $OCF_SUCCESS)</div><div> if ! ocf_is_true $OCF_RESKEY_force_stop; then</div><div> # Issue a graceful shutdown request</div><div> ocf_log info "Issuing graceful shutdown request for domain ${DOMAIN_NAME}."</div><div> virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}</div><div> # The "shutdown_timeout" we use here is the operation</div><div> # timeout specified in the CIB, minus 5 seconds</div><div> shutdown_timeout=$(( $NOW + ($OCF_RESKEY_CRM_meta_timeout/1000) -5 ))</div><div> # Loop on status until we reach $shutdown_timeout</div><div> while [ $NOW -lt $shutdown_timeout ]; do</div><div><br></div><div>Doesn't $OCF_RESKEY_CRM_meta_timeout correspond to the timeout value in the "op stop ..." line?</div><div><br></div><div>How can I optimize my pacemaker configuration so that the VM will attempt to gracefully shutdown and then at worst case destroy the VM before the pacemaker timeout is reached? Moreover, is there anything I can do inside of the VM (another Ubuntu 10.04 install) to optimize/speed up the shutdown process?</div><div><br></div><div>Thanks,</div><div><br></div><div>Andrew</div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "> </div></div></body></html>