[Pacemaker] Occasional error running ocf scripts

Fri Aug 13 06:31:27 EDT 2010

> > 99% of the time, the resource will stop correctly, it is just on a few
> > occasions that I see an error like this.
> > 
> > Is this a known problem, or can I generate extra logging to try help
> > debug?
> 
> Never heard of it. That sounds quite serious. Yes, extra logging
> would be helpful. How often did that happen? Which releases do
> you run?

I reported it to this list - without any reply. Then also filled a bug report:

http://developerbugs.linux-foundation.org/show_bug.cgi?id=2458

Also without a reply so far. 

I looked into lrmd code and it seems to only know what it passed as xml to it, 
so unlikely to be a cluster-glue issue. Now it would be much easier to debug, 
if lrmd would know about all resources and would know about required 
parameters. It then could fail immediately without calling the RA. But that is 
design problem.

IMHO, the issue was introced in pacemaker between 1.0.7 and 1.0.9, but I do 
not the time to track it further down. For now we simply continue to use 1.0.7 
(as I reported to the list before, 1.0.8 randomly fails to start resources, as 
we typically have above 30, 60 or even 120 resources, we run then run into 
random issues all the time...).

Cheers,
Bernd

-- 
Bernd Schubert
DataDirect Networks