[ClusterLabs] Pacemaker and Stonith : passive node won't bring up resources

Fri Jun 26 07:42:41 UTC 2015

В Wed, 24 Jun 2015 15:42:43 -0400
Digimer <lists at alteeve.ca> пишет:

> On 24/06/15 01:00 PM, Mathieu Valois wrote:
> > 
> > Le 24/06/2015 18:29, Ken Gaillot a écrit :
> >> On 06/24/2015 10:58 AM, Mathieu Valois wrote:
> >>> Hi everybody,
> >>> I'm working with Pacemaker and Stonith for High-Availability with
> >>> 2-nodes cluster (called here A and B). Both nodes have one IPMI as fence
> >>> device.
> >>>
> >>> The deal is :
> >>>
> >>>  * A is currently running resources
> >>>  * B is in passive mode
> >>>
> >>> Then I plug off the supply of the A node. So every eth interfaces AND
> >>> IPMI on A are unavailable. Here comes the trick : B tries unsuccessfully
> >>> to bring A down, cause A's IPMI is unreachable. When N attempts have
> >>> been done, B gives up and brings itself to "Block" state (called IDLE in
> >>> the log file).
> >> The behavior you describe is exactly what's intended. Since B can't
> >> *confirm* that A is down, it can't run resources without risking a
> >> split-brain situation.
> >>
> >>> Here is my question : how can I force B to bring back resources even if
> >>> Stonith A fails ?
> >> IPMI is not sufficient to be used as the only fence device. The
> >> preferred solution is to create a fencing topology with the IPMI as the
> >> first level, and a different fencing device (such as intelligent power
> >> strip) as the second level.
> >>
> >>> I understand the consequences (concurrent writes, etc ...), but I rather
> >>> like these compared to a service unavailable at all.
> >>>
> >>> Thanks for the help :)
> >> And here you get into perhaps the biggest recurring controversy in high
> >> availability. :) Depending on your resources, a split-brain situation
> >> might corrupt or lose some or all of your data. Silent corruption can be
> >> worse, you might have bad data and not even know it.
> > I can't afford getting another fencing device. I'm forced to do this
> > way. I've heard about quorum disk to manage split-brain issue.
> > Could it be used in such a case with only one IPMI device for each node
> > ? What does it involve ?
> 
> Quorum disk is a tool to help determine which node(s) should be quorate
> in a partition. It is not a substitute for fencing; Quorum and fencing
> server different roles.
> 

If all data in question resides on shared disk using SCSI-3 persistent
reservations could be an alternative. I do not know if pacemaker
natively integrates with it though ...

> If you want to be able to survive the total loss of the node, you must
> use secondary fencing. Switched PDUs, like the AP7900, can often be
> found on the used market for ~$200 each.
> 
> The software can not be configured to accept split-brains/data-loss in
> such a case. To the HA stack, "critical" means "critical", not "critical
> most of the time".
> 
> >> The consensus of HA professionals is that your data is not "available"
> >> if it is corrupted, so proper fencing is a necessity.
> >>
> >> That said, some people do drive without their seat belts on :) so it is
> >> possible to do what your describe. Dummy/null fence agents can always
> >> return success. It's playing Russian roulette with your data though.
> 
> Don't do this. You're short-circuiting safety systems.
> 
> If the node fails, let it block. If you are certain the peer is dead,
> clear the fence manually.
>