[Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

Lars Marowsky-Bree lmb at suse.com
Thu May 16 05:01:18 EDT 2013

On 2013-05-15T22:55:43, Andreas Kurz <andreas at hastexo.com> wrote:

> start-delay is an option of the monitor operation ... in fact means
> "don't trust that start was successfull, wait for the initial monitor
> some more time"

It can be used on start here though to avoid exactly this situation; and
it works fine for that, effectively being equivalent to the "delay"
option on stonith (since the start always precedes the fence).

> The problem is, this would only make sense for one single stonith
> resource that can fence more nodes. In case of a split-brain that would
> delay the start on that node where the stonith resource was not running
> before and gives that node a "penalty".

Sure. In a split-brain scenario, one side will receive a penalty, that's
the whole point of this exercise. In particular for the external/sbd

Or by grouping all fencing resources to always run on one node; if you
don't have access to RHT fence agents, for example.

external/sbd also has code to avoid a death-match cycle in case of
persistent split-brain scenarios now; after a reboot, the node that was
fenced will not join unless the fence is cleared first.

(The RHT world calls that "unfence", I believe.)

That should be a win for the fence_sbd that I hope to get around to
sometime in the next few months, too ;-)

> In your example with two stonith resources running all the time,
> Digimer's suggestion is a good idea: use one of the redhat fencing
> agents, most of them have some sort of "stonith-delay" parameter that
> you can use with one instance.

It'd make sense to have logic for this embedded at a higher level,
somehow; the problem is all too common.

Of course, it is most relevant in scenarios where "split brain" is a
significantly higher probability than "node down". Which is true for
most test scenarios (admins love yanking cables), but in practice, it's
mostly truly the node down.


Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

More information about the Pacemaker mailing list