[Pacemaker] Proposed new stonith topology syntax

Andrew Beekhof andrew at beekhof.net
Thu Jan 19 21:12:42 EST 2012


On Thu, Jan 19, 2012 at 8:23 AM, Digimer <linux at alteeve.com> wrote:
> On 01/18/2012 01:02 PM, Dejan Muhamedagic wrote:
>>> If I may restate;
>>>
>>> Out of band management devices (iLO, IPMI, w/e) have two fatal flaws
>>> which make them unreliable as sole fence devices; They share their power
>>> with the host and they (generally) have only one network link. If the
>>> node's PSU fails, or if the network link/BMC fails, fencing fails.
>>
>> I thought we were talking about computers with two PSU. If both
>> fail, that's already two faults and (our) clusters don't protect
>> from multiple faults. As for the rest (network connection, etc)
>> it's not shared with the host and if there's a failure in any of
>> these components it should be detected by the next monitor
>> operation on the stonith resource giving enough time to repair.
>> In short, a fencing device is not a SPOF.
>
> I was talking about the needs for a fence to succeed. So a node as RPSU,
> with each cable going to a different PDU. For the fence method to
> succeed, both actions must succeed (confirmed switching off both outlets).
>
> So I was talking (in this case) about the actual fence action succeeding
> or failing.
>
>>> A PDU as a backup protects against this, but is not ideal as it can't
>>> confirm a node's power state.
>>
>> Why is that? If you ask PDU to disconnect power to the host and
>> that command succeeds how high is the probability that the CPU is
>> still running? Or am I missing something?
>
> Two cases where this fails, both pebcak, but still real.
>
> One; RPSU where only one link was configured (or 2 or 3, whatever).
> Two; An admin moves the power cable to another outlet sometime between
> original configuration/testing and the need to fence.
>
> Never under-estimate the power of stupidity or the dangers of working
> late. :)
>
>>> Red Hat clusters call these "Fence Methods", with each "method"
>>> containing one or more fence "devices". With the IPMI, there is only one
>>> device. With Redundant PSUs across two PDUs, you have two devices in the
>>> "method". All devices in a method must succeed for the fence method to
>>> succeed.
>>>
>>> It would, if nothing else, help people migrating to pacemaker from rhcs
>>> if similar names were used.
>>
>> Pacemaker is already using terminology different from RHCS. I'm
>> not at all against using similar (or same) names, but it's
>> too late for that. Introducing RHCS specific names to co-exist
>> with Pacemaker names... well, how is that going to help?
>>
>> Thanks,
>>
>> Dejan
>
> If it's set, then it is set and there is no more discussion to be had.

Its not set in stone yet, but I don't think the term "method" works in
the pacemaker context.

> To answer your question though;
>
> Come EL7 (or whenever Pacemaker gains full support), as rgmanager is
> phased out, all the existing rhcs clusters will need to be migrated.
> More prescient; The admins who managed those cluster will need to be
> retrained. I would argue that everything that can be done to smooth that
> migration should be done, including seemingly trivial things like naming
> conventions.
>
> Cheers
>
> --
> Digimer
> E-Mail:              digimer at alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "omg my singularity battery is dead again.
> stupid hawking radiation." - epitron
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list