[Pacemaker] resource moving unnecessarily due to ping race condition

Andrew Beekhof andrew at beekhof.net
Fri Sep 23 02:10:59 UTC 2011


On Tue, Sep 20, 2011 at 10:34 PM, Brad Johnson <bjohnson at ecessa.com> wrote:
> It is not necessarily the case that the outside world can't reach the
> cluster. Ours is a multi-homed device connecting to multiple WANs and LANs.
> We want the device with the best connectivity to be the active device. To
> get around the problem of failovers occurring when a ping node reboots for
> example, I have written an fping OCF RA that uses different dampening delays
> based on if it is running on the active or idle device. I have also patched
> pacemaker attrd.c to fix it so it doesn't send an immediate update when it
> receives a flush message from the other node. This was causing it to ignore
> any running delay timer.

Thats the point of the flush message though.  So that all nodes write
their current value at the same time.

> Here is that patch:
>
> --- tools/attrd.orig.c    2011-09-13 08:29:46.946820348 -0500
> +++ tools/attrd.c    2011-09-14 13:33:59.606894754 -0500
> @@ -348,10 +348,14 @@
>         attrd_local_callback(xml);
>
>     } else if(ignore == NULL || safe_str_neq(from, attrd_uname)) {
> +        const char *attr  = crm_element_value(xml, F_ATTRD_ATTRIBUTE);
> +        /* Don't send update for score if msg is from other node */
> +        if(safe_str_eq(from, attrd_uname) || safe_str_neq(attr, "pingd")) {
>         crm_info("%s message from %s", op, from);
>         hash_entry = find_hash_entry(xml);
>         stop_attrd_timer(hash_entry);
>         attrd_perform_update(hash_entry);
> +        }
>     }
>     free_xml(xml);
>  }
>
>
> On 09/19/2011 10:51 PM, Andrew Beekhof wrote:
>>
>> On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov<vchepkov at gmail.com>  wrote:
>>>
>>> On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:
>>>
>>>>>> On 09/08/11 20:59, Brad Johnson wrote:
>>>>>>>
>>>>>>> We have a 2 node cluster with a single resource. The resource must
>>>>>>> run
>>>>>>> on only a single node at one time. Using the pacemaker:ocf:ping RA we
>>>>>>> are pinging a WAN gateway and a LAN host on each node so the resource
>>>>>>> runs on the node with the greatest connectivity. The problem is when
>>>>>>> a
>>>>>>> ping host goes down (so both nodes lose connectivity to it), the
>>>>>>> resource moves to the other node due to timing differences in how
>>>>>>> fast
>>>>>>> they update the score attribute. The dampening value has no effect,
>>>>>>> since it delays both nodes by the same amount. These unnecessary
>>>>>>> fail-overs aren't acceptable since they are disruptive to the network
>>>>>>> for no reason.
>>>>>>> Is there a way to dampen the ping update by different amounts on the
>>>>>>> active and passive nodes? Or some other way to configure the cluster
>>>>>>> to
>>>>>>> try to keep the resource where it is during these tie score
>>>>>>> scenarios?
>>>>
>>>> location pingd-constraint group_1 \
>>>>  rule $id="pingd-constraint-rule" pingd: defined pingd
>>>>
>>>> May I suggest that you simply change this constraint to
>>>>
>>>> location pingd-constraint group_1 \
>>>>  rule $id="pingd-constraint-rule" \
>>>>    -inf: not_defined pingd or pingd lte 0
>>>>
>>>> That way, only a host that definitely has _no_ connectivity carries a
>>>> -INF score for that resource group. And I believe that is what you
>>>> really want, rather than take the actual ping score as a placement
>>>> weight (your "best connectivity" approach).
>>>>
>>>> Just my 2 cents, though.
>>>>
>>> Even though this approach was recommended many times, there is a problem
>>> with it.
>>> What if all nodes for some reason are not able to ping ?
>>> This rule would cause a resource to be brought down completely, whereas
>>> if you use "best connectivity" approach it will stay up where it was before
>>> network failed.
>>
>> If the outside[1] world can't reach the cluster, is there much benefit
>> in having it running?
>>
>> [1] Substitute "outside" for wherever your users are, hopefully you
>> picked a ping node from the same area.
>>
>>> Vadym
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list