[ClusterLabs] Pacemaker and Stonith : passive node won't bring up resources

Mathieu Valois mvalois at teicee.com
Wed Jun 24 17:00:55 UTC 2015


Le 24/06/2015 18:29, Ken Gaillot a écrit :
> On 06/24/2015 10:58 AM, Mathieu Valois wrote:
>> Hi everybody,
>> I'm working with Pacemaker and Stonith for High-Availability with
>> 2-nodes cluster (called here A and B). Both nodes have one IPMI as fence
>> device.
>>
>> The deal is :
>>
>>  * A is currently running resources
>>  * B is in passive mode
>>
>> Then I plug off the supply of the A node. So every eth interfaces AND
>> IPMI on A are unavailable. Here comes the trick : B tries unsuccessfully
>> to bring A down, cause A's IPMI is unreachable. When N attempts have
>> been done, B gives up and brings itself to "Block" state (called IDLE in
>> the log file).
> The behavior you describe is exactly what's intended. Since B can't
> *confirm* that A is down, it can't run resources without risking a
> split-brain situation.
>
>> Here is my question : how can I force B to bring back resources even if
>> Stonith A fails ?
> IPMI is not sufficient to be used as the only fence device. The
> preferred solution is to create a fencing topology with the IPMI as the
> first level, and a different fencing device (such as intelligent power
> strip) as the second level.
>
>> I understand the consequences (concurrent writes, etc ...), but I rather
>> like these compared to a service unavailable at all.
>>
>> Thanks for the help :)
> And here you get into perhaps the biggest recurring controversy in high
> availability. :) Depending on your resources, a split-brain situation
> might corrupt or lose some or all of your data. Silent corruption can be
> worse, you might have bad data and not even know it.
I can't afford getting another fencing device. I'm forced to do this
way. I've heard about quorum disk to manage split-brain issue.
Could it be used in such a case with only one IPMI device for each node
? What does it involve ?
>
> The consensus of HA professionals is that your data is not "available"
> if it is corrupted, so proper fencing is a necessity.
>
> That said, some people do drive without their seat belts on :) so it is
> possible to do what your describe. Dummy/null fence agents can always
> return success. It's playing Russian roulette with your data though.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Once again, thanks a lot for your quick and detailed answer :)

----
Mathieu Valois




More information about the Users mailing list