[Pacemaker] Question on ILO stonith resource config and restarting

Wed Nov 5 08:56:22 EST 2008

> 
> BTW, did you try to test your ilo device with the stonith
> program. Use -d to get debugging output.
> 

I did not try it via the stonithd -d.  I was just tinkering with the
actual resource Python script (after setting the appropriate environment
variables).

When the LAN connection is up and available the script works well.  When
the connection is down it also works well and a timeout is thrown.

> I'd prefer to have the upper layer (stonithd) timeout. Why do
> you think that this would help?

That is fine.  I was just taking a stab at it and hoping to invoke a
discussion that the timeout should exist in the resource script.  Since
the upper layer is catching it that is good and the safe place for a
catch all; preventing underlying script errors/bugs from hanging the
cluster.

> 
> > I am trying to find out what the expected behavior should be for a
> > timeout on a start or monitor command.
> 
> A timeout on start is actually a timeout on monitor. Every
> stonith start includes a monitor operation. Otherwise, start
> should've been named "enable" for stonith resources.
> 

Is it OK and expected for a Stonith resource which has timed-out to go
into a state of not able to be run on any node w/o user intervention?

Thanks,
-ab