[Pacemaker] [Linux-HA] new doc about stonith/fencing

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Jun 2 10:24:59 EDT 2009


On Fri, May 29, 2009 at 02:31:37PM -0400, Ryan Steele wrote:
> Jan Kalcic wrote:
>> Really interesting. I would have appreciated some more example (they are
>> always welcome) but still very interesting.
>> Thanks,
>> Jan
>> Dejan Muhamedagic wrote:
>>> Hi,
>>> Trying to make it a bit less mysterious, I wrote something about
>>> fencing and stonith quite a while ago and then forgot to share
>>> the link. Sorry about that.
>>> Here it is:
>>> http://www.clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf
>>> As usual, constructive criticism/suggestions/etc are welcome.
>>> I won't be able to read your impressions for the next two weeks,
>>> but will sure look forward to see them afterwards.
>>> Cheers,
>>> Dejan
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> I found this to be informative as well, Dejan - thanks for taking the time 
> to write this.  However, I too agree with Jan in that some examples using 
> more recommended non-testing STONITH devices would be great, since SSH, 

My idea was to introduce the fencing configuration incrementally,
starting with the ridiculously simple fencing devices such as
null and then describing those that are more complex. That in
this case turned out to be only ibm rsa. At any rate, I was
hoping that the readers can interpolate from these examples a
configuration which they need.

Anyway, more examples should follow. Perhaps others with other
devices would like to share their working fencing setups.

> null, and other network-based tests are apparently frowned upon in 
> production environments (based on comments by Andrew and the article here 
> which he referenced: 
> http://theclusterguy.clusterlabs.org/post/113230399/highly-available-data-corruption). 

Indeed external/ssh is not to be used in production. However, the
lights-out devices (hp ilo, ibm rsa, drac) should not be put in
the same category. They do depend on the host's power, but if
your power distribution is good and nodes are equipped with dual
power supplies, then in most setups they should do. Note that it
is all a matter of probability (how likely it is that _only_ one
node and its management device stay without power?) and that
depends on your circumstances. Recently I heard that now some of
these devices come with their own battery in which case they are
perfectly good fencing devices. BTW, good document.

>  For example, I have Raritan 30A PDU's in my cabs, but I didn't see 
> anything in the output of 'stonith -L' except an APC switched rack PDU.
> Now I know that a document like this can't be expected to cover every 
> single type of STONITH device in existence, but some instructions on 
> writing custom STONITH plugins might be useful so that folks can write them 
> for their particular STONITH device (PDU or IPMI card or what have you) and 
> contribute back to the community which will in turn help others.   I've 
> looked at both the clusterlabs.org and linux-ha.org sites, but didn't see 
> any documentation on rolling your own at either site, and the Novell docs 
> on this topic were GUI-centric which unfortunately aren't as helpful to 
> those of use sticking with the CLI.

Implementing stonith plugins is a different matter altogether, so
it deserves a separate document. For the time being, if you want
to write a plugin: get some documentation for your device (see
also the Network UPS Tools project) and take a look at some of
the existing plugins such as null or external/ibmrsa. There are
so many power management solutions out there and then not that
many devices available ;-)

> The other thing that might be helpful is to know what the goal is in terms 
> of recovering from a STONITH action.  If one has a node that STONITH powers 
> off at the PDU outlet because it's lost networking, and then networking is 
> subsequently restored, how are we do get the node back in action?

Power on?



> Thanks and Regards,
> Ryan
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

More information about the Pacemaker mailing list