[ClusterLabs] Pacemaker and Stonith : passive node won't bring up resources
Ken Gaillot
kgaillot at redhat.com
Wed Jun 24 16:29:07 UTC 2015
On 06/24/2015 10:58 AM, Mathieu Valois wrote:
> Hi everybody,
> I'm working with Pacemaker and Stonith for High-Availability with
> 2-nodes cluster (called here A and B). Both nodes have one IPMI as fence
> device.
>
> The deal is :
>
> * A is currently running resources
> * B is in passive mode
>
> Then I plug off the supply of the A node. So every eth interfaces AND
> IPMI on A are unavailable. Here comes the trick : B tries unsuccessfully
> to bring A down, cause A's IPMI is unreachable. When N attempts have
> been done, B gives up and brings itself to "Block" state (called IDLE in
> the log file).
The behavior you describe is exactly what's intended. Since B can't
*confirm* that A is down, it can't run resources without risking a
split-brain situation.
> Here is my question : how can I force B to bring back resources even if
> Stonith A fails ?
IPMI is not sufficient to be used as the only fence device. The
preferred solution is to create a fencing topology with the IPMI as the
first level, and a different fencing device (such as intelligent power
strip) as the second level.
> I understand the consequences (concurrent writes, etc ...), but I rather
> like these compared to a service unavailable at all.
>
> Thanks for the help :)
And here you get into perhaps the biggest recurring controversy in high
availability. :) Depending on your resources, a split-brain situation
might corrupt or lose some or all of your data. Silent corruption can be
worse, you might have bad data and not even know it.
The consensus of HA professionals is that your data is not "available"
if it is corrupted, so proper fencing is a necessity.
That said, some people do drive without their seat belts on :) so it is
possible to do what your describe. Dummy/null fence agents can always
return success. It's playing Russian roulette with your data though.
More information about the Users
mailing list