[Pacemaker] Question on ILO stonith resource config and restarting

Andreas Mock andreas.mock at web.de
Thu Oct 30 01:39:14 UTC 2008


Aaron Bush schrieb:
> I am mostly concerned that I ended up with a node that had no associated
> stonith resource available to shoot it if it was truly down since the
> resource did not restart like I thought it should once the network cable
> was reconnected.
>   
Hi Aaron,

without knowing the details: Is the stonith plugin implemented to time 
out and return FALSE?
In this case failure count should be raised for that stonith plugin 
resource and you get a
change for the resource score.
A list member once contributed a script showscore.sh which shows the 
current score of a
resource in the cluster. You should watch your stonith resource in that 
failure case.
Probably the score gets so bad that the resource can't be started 
anywhere. But just a guess.
The best what you can do IMHO is ignore the failures for score 
calculation, but react on them
externally (e.g. nagios monitoring). Failure count would raise with each 
try but score should be
kept constant.

But probably Dejan can bring additional light to this.  :-)

Best regards
Andreas Mock





More information about the Pacemaker mailing list