[Pacemaker] Proposed new stonith topology syntax

Digimer linux at alteeve.com
Wed Jan 18 16:23:50 EST 2012


On 01/18/2012 01:02 PM, Dejan Muhamedagic wrote:
>> If I may restate;
>>
>> Out of band management devices (iLO, IPMI, w/e) have two fatal flaws
>> which make them unreliable as sole fence devices; They share their power
>> with the host and they (generally) have only one network link. If the
>> node's PSU fails, or if the network link/BMC fails, fencing fails.
> 
> I thought we were talking about computers with two PSU. If both
> fail, that's already two faults and (our) clusters don't protect
> from multiple faults. As for the rest (network connection, etc)
> it's not shared with the host and if there's a failure in any of
> these components it should be detected by the next monitor
> operation on the stonith resource giving enough time to repair.
> In short, a fencing device is not a SPOF.

I was talking about the needs for a fence to succeed. So a node as RPSU,
with each cable going to a different PDU. For the fence method to
succeed, both actions must succeed (confirmed switching off both outlets).

So I was talking (in this case) about the actual fence action succeeding
or failing.

>> A PDU as a backup protects against this, but is not ideal as it can't
>> confirm a node's power state.
> 
> Why is that? If you ask PDU to disconnect power to the host and
> that command succeeds how high is the probability that the CPU is
> still running? Or am I missing something?

Two cases where this fails, both pebcak, but still real.

One; RPSU where only one link was configured (or 2 or 3, whatever).
Two; An admin moves the power cable to another outlet sometime between
original configuration/testing and the need to fence.

Never under-estimate the power of stupidity or the dangers of working
late. :)

>> Red Hat clusters call these "Fence Methods", with each "method"
>> containing one or more fence "devices". With the IPMI, there is only one
>> device. With Redundant PSUs across two PDUs, you have two devices in the
>> "method". All devices in a method must succeed for the fence method to
>> succeed.
>>
>> It would, if nothing else, help people migrating to pacemaker from rhcs
>> if similar names were used.
> 
> Pacemaker is already using terminology different from RHCS. I'm
> not at all against using similar (or same) names, but it's
> too late for that. Introducing RHCS specific names to co-exist
> with Pacemaker names... well, how is that going to help?
> 
> Thanks,
> 
> Dejan

If it's set, then it is set and there is no more discussion to be had.
To answer your question though;

Come EL7 (or whenever Pacemaker gains full support), as rgmanager is
phased out, all the existing rhcs clusters will need to be migrated.
More prescient; The admins who managed those cluster will need to be
retrained. I would argue that everything that can be done to smooth that
migration should be done, including seemingly trivial things like naming
conventions.

Cheers

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron




More information about the Pacemaker mailing list