[Pacemaker] chain/cascade stonith agents?

Thu Aug 16 19:44:45 EDT 2012

On Fri, Aug 17, 2012 at 2:21 AM, Bob Haxo <bhaxo at sgi.com> wrote:
>
> On Thu, 2012-08-16 at 09:37 +1000, Andrew Beekhof wrote:
>> On Thu, Aug 16, 2012 at 1:59 AM, Bob Haxo <bhaxo at sgi.com> wrote:
>> > HI All,
>> >
>> > Is chaining/cascading of stonith agents implemented?
>>
>> Yes.  But you'll want to use the current git HEAD
>>
>> > If yes, would
>> > someone please point me to the documentation?
>>
>> Um, I'm sorry to say that it's not actually documented yet :-(
>>
>> I can provide an example though, it should be reasonably self explanatory
>>
>> <cib crm_feature_set="3.0.6" validate-with="pacemaker-1.2"
>> admin_epoch="1" epoch="0" num_updates="0">
>>   <configuration>
>> ...
>>     <fencing-topology>
>>       <!-- try poison-pill and fail back to power -->
>>       <fencing-level id="f-p1.1" target="pcmk-1" index="1"
>> devices="poison-pill"/>
>>       <fencing-level id="f-p1.2" target="pcmk-1" index="2" devices="power"/>
>>
>>       <!-- try disk and network, and fail back to power -->
>>       <fencing-level id="f-p2.1" target="pcmk-2" index="1"
>> devices="disk,network"/>
>>       <fencing-level id="f-p2.2" target="pcmk-2" index="2" devices="power"/>
>>     </fencing-topology>
>>   </configuration>
>>   <status/>
>> </cib>
>> .
>>
>> > I'd like to implement a stonith chain in which stonith_ipmilan is the
>> > first stonith agent, and if that fails, a second stonith agent gets
>> > called (for example stonith_apc).
>> >
>> > ((In short, I find it tiresome to pull the power cable(s) for a HA
>> > failover demonstration only to have the failover, well, fail, when
>> > stonith_ipmilan goes into a failure loop when it doesn't get a response
>> > from the powered-off BMC.))
>> >
>> > Is there a way of setting stonith_ipmilan to give up and return a
>> > "stonith success"?  I was thinking that I would chain stonith_ipmilan
>> > with the ever popular stonith_null to achieve this end.
>>
>> For a demo, sure.
>> But in production, how do you tell the difference between "I can't
>> reach the BMC because its powered off" and "I can't reach the BMC
>> because my network link to it is disrupted"?
>>
>> Note there is also 'stonith_admin --confirm $node' which will tell
>> stonith-ng and the rest of pacemaker that $node is safely down.
>
> Yes, it is a trade-off.  Certainly during development, I'm less
> concerned about a corrupted virt than I am concerned about the hang that
> occurs when there is no response to the lack of response to the
> powered-off system.  The virt can easily be re-imaged.
>
> Is there an easier way of forcing the stonith_ipmilan to give-up than
> chaining to stonith_null?

Creating fence_bob which calls fence_ipmilan and always returns "all
good!" is probably the simplest option.
But I reserve the right to say "told you so" when things go pear shaped ;-)

>
> Thanks,
> Bob Haxo
>
>>
>> >
>> > Cheers,
>> > Bob Haxo
>> > bhaxo at sgi.com
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org