[Pacemaker] RFC: stonith-enabled="error-recovery"

Maros Timko timkom at gmail.com
Fri Jun 25 08:39:50 EDT 2010


> Date: Thu, 24 Jun 2010 17:46:39 +0200
> From: Lars Marowsky-Bree <lmb at novell.com>
> To: pacemaker at oss.clusterlabs.org
> Subject: [Pacemaker] RFC: stonith-enabled="error-recovery"
> Message-ID: <20100624154639.GF5234 at suse.de>
> Content-Type: text/plain; charset=iso-8859-1
>
> Hi,
>
> this is about a new setting for stonith mode.
>
> Basically, a node failure would not cause a fence - the node would be
> trusted to be truly down and have self-fenced. (Certain hardware
> infrastructures can guarantee this, and also drive the probability of
> split-communication down to be neglible; or the issue of re-syncing the
> data be considered acceptably solved (drbd).)
>
> However, fencing would still be welcome for error cleanup (say, stop
> failures).
>
> Do others think this would be a useful idea?

Yes, I would like to have such option. Maybe strict HA world will have
some complains but as soon as it is just a configurable option, it's
OK.
Basically I was trying to achieve the same with tweaking STONITH
plugins to try couple of times and return success after specified
failed STONITH requests. Otherwise the standby node cannot be
activated without manual intervention. My colleague calls this "best
effort STONITH". In most of the cases we depend on shared IPMI
interface for STONITHs that in most of the cases is connected to
public network via network switch.

Tino

>
> An alternative route could be to implement a STONITH plugin that returns
> success if the node is missing from the membership layer, and "pass" if
> it is present (thus invoking the next STONITH plugin in the priority
> list). But I think the PE-approach would be cleaner.
>
>
> Regards,
>    Lars
>




More information about the Pacemaker mailing list