[Pacemaker] Question on ILO stonith resource config and restarting

Thu Oct 30 09:21:53 EDT 2008

Hi,

On Thu, Oct 30, 2008 at 01:36:18AM +0100, Andreas Mock wrote:
> Aaron Bush schrieb:
>> I then noticed that my ILO clones were starting on the 'wrong' nodes.
>> As in the stonith resource to kill node 2 was actually running on node
>> 2; which is pointless if node 2 locks up.  So I added resource
>> constraints to force the stonith clone to stay on a node that was not
>> the one to be shot.  This seemed to work well.
>>   
> Dejan,
> is self-stonithing possible meanwhile?

No, it's not, with the exception of the null (for testing) and
suicide (for the suicidal) plugins.

> How is the problem of a one-node 
> cluster solved?

stonithd will simply drop its name from the host list as
configured. The resource is then started with such reduced host
list. If the list turned out to be empty, then start fails.

> Often discussed, but I don't know the current status.
>
>> The next issue I have is that when I disconnect the LAN cable on a
>> single node that connects it to the rest of the network the clone
>> stonith monitor will fail since it can't connect to the other nodes ILO
>> for status.  After some time (minutes let's say) I reconnect the LAN
>> cable but never see the clone stonith come back to life, just stays
>> failed.  What should I be looking at to make sure that the clone stonith
>> restarts properly.
>>   
> There is something I don't understand: You cut the network connection to 
> the ILO only?
> If yes, than the monitor action of the stonith plugin gets a failure and 
> you can/have to react
> on that as usual. In your case: Ignore the failue and try again further. 
> You can't do something better,
> besides to raise redundancy, IMHO.
>
>
>> Any advice on how to more properly setup an HP ILO stonith in this
>> scenario would be greatly appreciated.  (I can see where a clone stonith
>> would be useful in a large cluster of n>2 nodes since all nodes could
>> have a chance to shoot a failed node and maybe this is the reason for
>> cloned stonith with ILO?  Basically in a cluster of N nodes each node
>> would be running N-1 stonith resources, ready to shoot a dead node.)
>>   
> That is exactly my knowledge/assumption, which doesn't mean anything. :-)
> In a two-node-cluster you can IMHO also work with simple stonith-primitives 
> and appropriate
> location constraints. Dejan: Is that assumption correct?

Right.

> Another questions to Dejan in that context: What has to be done to make a 
> stonith plugin clone-aware?

Nothing.

> Is it right that someone can create a chain of stonith plugins/methods? 
> (More parallel weapons to shoot with?)

Yes. There are priorities now, but only in the very latest
pacemaker code (not available in 2.1.4). Timeout handling was
also improved/fixed in the same timeframe. Even that won't run
perfectly, but it should be good enough for most setups, in
particular for two node clusters. Their may be issues in >2 node
clusters if it happens that the stonith resources of different
types are started on different nodes _and_ if they should be
tried only in certain order. Some typical examples are the
recently contributed kdumpcheck and the meatware plugin when used
as the ultimate option.

Thanks,

Dejan

> Best regards
> Andreas Mock
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker