[Pacemaker] Shooting and diagnosis of stonith plugins

Lars Marowsky-Bree lmb at suse.de
Mon Oct 27 07:56:14 EDT 2008


On 2008-10-16T18:00:16, Takenaka Kazuhiro <takenaka.kazuhiro at oss.ntt.co.jp> wrote:

>> This is an inherent problem of the lights-out devices such as IBM
>> RSA or HP iLO, i.e. that they share power source with the node
>> they manage. Power failure renders this kind of stonith device
>> useless. Unfortunately, there's nothing one can do about it.
> But something must be done.

Chiming in late, but: "No." These devices by definition cannot perform
the required service of providing an "independent" means of confirming
the node's status, in particular for split-brain avoidance.

The devices don't; their failure is correlated strongly with the failure
of the network to the failed node. They simply cannot provide what is
needed here, and thus do not provide adequate fencing guarantees.


>   A) Check the target by another way.
>   B) Retry forever.
>   C) Return failure to caller.
>
> A is what 'ssh' does.
>   And you said 'ssh' isn't a production.
>   Does it mean any other real stonith plugin must not do A?

"ssh" is testing only. It is not useful for production.

> B is remarked in http://www.linux-ha.org/STONITH.
>   it says like this.
>
>     3. When given a RESET or OFF command it must not return
>        control to its caller until the node is no longer running.
>
>   Any plugin follows B keeps running until stonithd kills it
>   on an error.
>
> C is what 'ibmrsa-telnet' does.
>   Any plugin follows C returns failure on an error immediatly.
>   But I don't know any document which encourages C.
>
> Which is a right choice for real stonith plugins?

C); if it can't be sure that it was successful, it must return an error.
The TE will abort the transition, the PE will recompute, and possibly
the next fencing request will succeed.

(ie, C is equivalent to B.)


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Pacemaker mailing list