[Pacemaker] resource moving unnecessarily due to ping race condition

On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:

>>> On 09/08/11 20:59, Brad Johnson wrote:
>>>> We have a 2 node cluster with a single resource. The resource must run
>>>> on only a single node at one time. Using the pacemaker:ocf:ping RA we
>>>> are pinging a WAN gateway and a LAN host on each node so the resource
>>>> runs on the node with the greatest connectivity. The problem is when a
>>>> ping host goes down (so both nodes lose connectivity to it), the
>>>> resource moves to the other node due to timing differences in how fast
>>>> they update the score attribute. The dampening value has no effect,
>>>> since it delays both nodes by the same amount. These unnecessary
>>>> fail-overs aren't acceptable since they are disruptive to the network
>>>> for no reason.
>>>> Is there a way to dampen the ping update by different amounts on the
>>>> active and passive nodes? Or some other way to configure the cluster to
>>>> try to keep the resource where it is during these tie score scenarios?
> location pingd-constraint group_1 \
>  rule $id="pingd-constraint-rule" pingd: defined pingd
> May I suggest that you simply change this constraint to
> location pingd-constraint group_1 \
>  rule $id="pingd-constraint-rule" \
>    -inf: not_defined pingd or pingd lte 0
> That way, only a host that definitely has _no_ connectivity carries a
> -INF score for that resource group. And I believe that is what you
> really want, rather than take the actual ping score as a placement
> weight (your "best connectivity" approach).
> Just my 2 cents, though.

Even though this approach was recommended many times, there is a problem with it.
What if all nodes for some reason are not able to ping ? 
This rule would cause a resource to be brought down completely, whereas if you use "best connectivity" approach it will stay up where it was before network failed.


