[Pacemaker] resource stickiness and preventing stonith on failback

Tue Sep 20 00:04:43 EDT 2011

On Tue, Sep 20, 2011 at 1:58 PM, Brian J. Murrell <brian at interlinx.bc.ca> wrote:
> On 11-09-19 11:02 PM, Andrew Beekhof wrote:
>> On Wed, Aug 24, 2011 at 6:56 AM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA at public.gmane.org> wrote:
>>>
>>> 2. preventing the active node from being STONITHed when the resource
>>>   is moved back to it's failed-and-restored node after a failover.
>>>   IOW: BAR1 is available on foo1, which fails and the resource is moved
>>>   to foo2.  foo1 returns and the resource is failed back to foo1, but
>>>   in doing that foo2 is STONITHed.
>>>
>>> As for #2, the issue with STONITHing foo2 when failing back to foo1 is
>>> that foo1 and foo2 are an active/active pair of servers.  STONITHing
>>> foo2 just to restore foo1's services puts foo2's services out of service,
>>>
>>> I do want a node that is believed to be dead to be STONITHed before it's
>>> resource(s) are failed over though.
>>
>> Thats a great way to ensure your data gets trashed.
>
> What's that?
>
>> If the "node that is believed to be dead" isn't /actually/ dead,
>> you'll have two nodes running the same resources and writing to the
>> same files.
>
> Where did I say I wanted a node that was believed to be dead not to be
> STONITHed before another node takes over the resource?

Urgh.  My brain put a negation in there somewhere. Sorry.  Too many emails.

> I actually said
> (I left it in the quoted portion above if you want to go back and read
> it) "I do want a node that is believed to be dead to be STONITHed before
> it's resource(s) are failed over though."
>
> The node I don't want STONITHed is the failover node that is alive and
> well and can be told to release the resource cleanly and can confirm its
> release.  This is the node in the active/active pair (i.e. a pair that
> each serve half of the resources) that is currently running all of the
> resources due to it's partner having failed.  Of course I don't want
> this node's resources to have to be interrupted just because the failed
> node has come back.
>
> And it all does seem to work that way, FWIW.  I'm not sure why my
> earlier experiments didn't bear that out.
>
> Cheers,
> b.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>