[ClusterLabs] Two Node NFS doesn't failover with hardware glitches
Erich Prinz
erich at live2sync.com
Tue Apr 7 19:31:50 UTC 2015
> On Apr 7, 2015, at 13:01, Kristoffer Grönlund <kgronlund at suse.com> wrote:
>
> Erich Prinz <erich at live2sync.com> writes:
>
>> Still, this doesn't solve for the problem of a resource hanging on the primary node. Everything I'm reading indicates fencing is required, yet the boilerplate configuration from Linbit has stonith disabled.
>>
>> These units are running CentOS 6.5
>> corosync 1.4.1
>> pacemaker 1.1.10
>> drbd
>>
>> Two questions then:
>>
>> 1. how do we handle cranky hardware issues to ensure a smooth failover?
>> 2. what additional steps are needed to ensure the NFS mounts don't go stale on the clients?
>>
>>
>
> As you might have guessed, you have answered your question already -
> what you need to solve this situation is stonith. When a node refuses to
> die gracefully, you really do need stonith to force it into a known
> state.
>
> These days most documentation tries to emphasize this more than in the
> past. I can recommend Tims cartoon explanation of how and why stonith
> works:
>
> http://ourobengr.com/stonith-story/
>
> --
> // Kristoffer Grönlund
> // kgronlund at suse.com
Thanks Kristoffer.
I certainly understand the death match - funny cartoon to drive the point home.
The underlying question then is: how to implement a non-power fence to force the node to release? Or is that even possible?
More information about the Users
mailing list