[Pacemaker] Pacemaker-remote with KVM - start timeout not working

David Vossel dvossel at redhat.com
Mon Sep 15 12:46:24 EDT 2014



----- Original Message -----
> Hi!
> 
> I guess it would be better to start a separate thread on this.
> 
> I have a VM with pacemaker-remote installed.
> 
> Stack: cman
> Current DC: wings1 - partition with quorum
> Version: 1.1.10-14.el6-368c726
> 3 Nodes configured
> 2 Resources configured
> 
> 
> Online: [ oracle-test:vm-oracle-test wings1 wings2 ]

The remote-node in this case is named 'oracle-test'. The remote-node's container resource
is 'vm-oracle-test'.  Internally pacemaker makes a connection resource named after the
remote-node. That resource represents the pacemaker_remote connection.

Kind of confusing I know. Here's the point.  The connection resource 'oracle-test' is what is
timing out here, not the vm itself. By default the connection resource has a 60 second
timeout. If you want to increase that timeout use the remote-connect-timeout resource
metadata option.  You don't have to fully understand how all this works, just know that the
remote-connection-timeout option needs to be greater than the time it takes for the virtual
machine to fully initialize.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options

Hope that helps!

-- Vossel

> 
> vm-oracle-test (ocf::heartbeat:VirtualDomain): Started wings2
> 
> 2 resources configured...
> 
> However,
> 
> # pcs resource show
> vm-oracle-test (ocf::heartbeat:VirtualDomain): Started
> 
> As I understand, pacemaker considered pacemaker-remote on the VM as some sort
> of 'virtual resource' (called 'oracle-test' in my case), since I have only
> one 'primitive' section (VirtualDomain) in my CIB.
> 
> Well, the problem is here:
> 
> Sep 15 12:28:13 wings1 crmd[13553]: error: process_lrm_event: LRM operation
> oracle-test_start_0 (8397) Timed Out (timeout=60000ms)
> Sep 15 12:28:13 wings1 crmd[13553]: warning: status_from_rc: Action 7
> (oracle-test_start_0) on wings1 failed (target: 0 vs. rc: 1): Error
> Sep 15 12:28:13 wings1 crmd[13553]: warning: update_failcount: Updating
> failcount for oracle-test on wings1 after failed start: rc=1
> (update=INFINITY, time=1
> 410769693)
> 
> Timeout is 60 seconds! Even though I have:
> 
> <primitive class="ocf" id="vm-oracle-test" provider="heartbeat"
> type="VirtualDomain">
> <instance_attributes id="vm-oracle-test-instance_attributes">
> <nvpair id="vm-oracle-test-instance_attributes-hypervisor" name="hypervisor"
> value="qemu:///system"/>
> <nvpair id="vm-oracle-test-instance_attributes-config" name="config"
> value="/etc/libvirt/qemu/oracle-test.xml"/>
> </instance_attributes>
> <operations>
> <op id="vm-oracle-test-monitor-interval-60s" interval="60s" name="monitor"/>
> <op id="vm-oracle-test-start-timeout-300s-interval-0s-on-fail-restart"
> interval="0s" name="start" on-fail="restart" timeout="300s"/>
> <op id="vm-oracle-test-stop-timeout-60s-interval-0s-on-fail-block"
> interval="0s" name="stop" on-fail="block" timeout="60s"/>
> </operations>
> 
> Moreover, VirtualDomain RA has this:
> 
> <actions>
> <action name="start" timeout="90" />
> <action name="stop" timeout="90" />
> <action name="status" depth="0" timeout="30" interval="10" />
> <action name="monitor" depth="0" timeout="30" interval="10" />
> <action name="migrate_from" timeout="60" />
> <action name="migrate_to" timeout="120" />
> 
> 
> My VM is unable to start in 60 seconds. What could be done here?
> 
> --
> Best regards,
> Alexandr A. Alexandrov
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 




More information about the Pacemaker mailing list