[Pacemaker] Cannot start VirtualDomain resource after restart

Phil Frost phil at macprofessionals.com
Wed Jun 20 10:40:29 EDT 2012


On 06/20/2012 10:11 AM, emmanuel segura wrote:
> I don't know but see the fail it's in the operation lx0_monitor_0, so 
> i ask to someone with more experience then me, if pacemaker does a 
> monitor operation before start?

I'm just learning Pacemaker myself, so I could be wrong on some points. 
I don't have any specific solutions to give, but I can share some 
troubleshooting techniques that might give some deeper insight into what 
is happening.

Firstly, I'd try running "crm_simulate -LS -D pacemaker.dot", then 
viewing the generated pacemaker.dot with graphviz [1] (specifically 
"dot". It might also be helpful to pass pacemaker.dot through "tred" 
first, to make it more readable). This asks crm_simulate to simulate 
what pacemaker would like to do (-S), given the current live state (-L). 
Probably it will tell you it would do nothing, because it's already 
running in the desired (by pacemaker, anyway) state. However, I have 
seen instances in testing where Pacemaker will be stuck in some start -> 
monitor -> timeout loop that's not immediately obvious in crm_mon. This 
will reveal that.

You can also use crm_simulate to see what Pacemaker would do if you 
rebooted everything. This can give you some insight because it removes 
the current state of all your nodes from the equation. To do this, you 
have to generate a CIB dump without a status section. You can do that by 
manually editing the output of "cibadmin -Q", but an easier way is to 
run "crm configure show xml". Since there's no status section, 
crm_simulate will assume the nodes are offline, so you also have to use 
the "-u" option to tell it to simulate the nodes coming online. Putting 
that all together, you get something like this:

crm configure show xml | crm_simulate -Sp -D pacemaker.dot -u node01 -u 
node02 [-u node03 ...]

Of course you will have to adjust the node names to suit your 
environment. You should see Pacemaker wanting to start all your 
resources. If not, there's probably something in your configuration that 
prevents it from doing so. Coincidentally, you will also see here the 
answer to your question: Pacemaker does do a monitor of a resource on 
all nodes before starting it. This way, it can avoid starting it if it 
was already running but it didn't know about it.

If all that proves unfruitful, you can continue to run other "what-if" 
tests by dumping the current CIB with "cibadmin -Q", editing it, and 
passing it into crm_simulate. In this way you can make some guesses 
about what's wrong and test your hypothesis.

[1] http://www.graphviz.org/





More information about the Pacemaker mailing list