[Pacemaker] resource moving unnecessarily due to ping race condition

Mon Sep 26 08:57:27 EDT 2011

I agree that the patch assumes the use of "pingd" for the attribute 
name, and there may be a better way of coding that. However, I don't see 
how setting dampen=0 fixes our problem. The problem occurs when a ping 
node becomes inaccessible to all nodes in the cluster (it is rebooted 
for example). Without giving any timing advantage to the currently 
active node, it is essentially just a race between the nodes to see who 
notices the outage first and can update the attribute fastest. The 
result is we see fail-over when the ping node goes down, and fail-back 
when it comes back up. The fact is that dampening alone does not solve 
this. Which is why we use a resource agent that uses selective dampening 
based on where the resource is running.

On 09/25/2011 08:58 PM, Andrew Beekhof wrote:
> On Fri, Sep 23, 2011 at 9:53 PM, Brad Johnson<bjohnson at ecessa.com>  wrote:
>> Yes, but the patch only affects the pingd attribute.
> Use of the name 'pingd' isnt mandatory though.
>
>> And we do not want the
>> other node to be able to challenge us to an immediate score comparison. That
>> is the whole idea behind the fping OCF resource agent we are using, to give
>> the timing advantage to the node currently running the resource by delaying
>> rising scores on the idle, and falling scores on the active node.
> Why not just set dampen=0?
>
>> On 09/22/2011 09:10 PM, Andrew Beekhof wrote:
>>> On Tue, Sep 20, 2011 at 10:34 PM, Brad Johnson<bjohnson at ecessa.com>
>>>   wrote:
>>>> It is not necessarily the case that the outside world can't reach the
>>>> cluster. Ours is a multi-homed device connecting to multiple WANs and
>>>> LANs.
>>>> We want the device with the best connectivity to be the active device. To
>>>> get around the problem of failovers occurring when a ping node reboots
>>>> for
>>>> example, I have written an fping OCF RA that uses different dampening
>>>> delays
>>>> based on if it is running on the active or idle device. I have also
>>>> patched
>>>> pacemaker attrd.c to fix it so it doesn't send an immediate update when
>>>> it
>>>> receives a flush message from the other node. This was causing it to
>>>> ignore
>>>> any running delay timer.
>>> Thats the point of the flush message though.  So that all nodes write
>>> their current value at the same time.
>>>
>>>> Here is that patch:
>>>>
>>>> --- tools/attrd.orig.c    2011-09-13 08:29:46.946820348 -0500
>>>> +++ tools/attrd.c    2011-09-14 13:33:59.606894754 -0500
>>>> @@ -348,10 +348,14 @@
>>>>          attrd_local_callback(xml);
>>>>
>>>>      } else if(ignore == NULL || safe_str_neq(from, attrd_uname)) {
>>>> +        const char *attr  = crm_element_value(xml, F_ATTRD_ATTRIBUTE);
>>>> +        /* Don't send update for score if msg is from other node */
>>>> +        if(safe_str_eq(from, attrd_uname) || safe_str_neq(attr,
>>>> "pingd")) {
>>>>          crm_info("%s message from %s", op, from);
>>>>          hash_entry = find_hash_entry(xml);
>>>>          stop_attrd_timer(hash_entry);
>>>>          attrd_perform_update(hash_entry);
>>>> +        }
>>>>      }
>>>>      free_xml(xml);
>>>>   }
>>>>
>>>>
>>>> On 09/19/2011 10:51 PM, Andrew Beekhof wrote:
>>>>> On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov<vchepkov at gmail.com>
>>>>>   wrote:
>>>>>> On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:
>>>>>>
>>>>>>>>> On 09/08/11 20:59, Brad Johnson wrote:
>>>>>>>>>> We have a 2 node cluster with a single resource. The resource must
>>>>>>>>>> run
>>>>>>>>>> on only a single node at one time. Using the pacemaker:ocf:ping RA
>>>>>>>>>> we
>>>>>>>>>> are pinging a WAN gateway and a LAN host on each node so the
>>>>>>>>>> resource
>>>>>>>>>> runs on the node with the greatest connectivity. The problem is
>>>>>>>>>> when
>>>>>>>>>> a
>>>>>>>>>> ping host goes down (so both nodes lose connectivity to it), the
>>>>>>>>>> resource moves to the other node due to timing differences in how
>>>>>>>>>> fast
>>>>>>>>>> they update the score attribute. The dampening value has no effect,
>>>>>>>>>> since it delays both nodes by the same amount. These unnecessary
>>>>>>>>>> fail-overs aren't acceptable since they are disruptive to the
>>>>>>>>>> network
>>>>>>>>>> for no reason.
>>>>>>>>>> Is there a way to dampen the ping update by different amounts on
>>>>>>>>>> the
>>>>>>>>>> active and passive nodes? Or some other way to configure the
>>>>>>>>>> cluster
>>>>>>>>>> to
>>>>>>>>>> try to keep the resource where it is during these tie score
>>>>>>>>>> scenarios?
>>>>>>> location pingd-constraint group_1 \
>>>>>>>   rule $id="pingd-constraint-rule" pingd: defined pingd
>>>>>>>
>>>>>>> May I suggest that you simply change this constraint to
>>>>>>>
>>>>>>> location pingd-constraint group_1 \
>>>>>>>   rule $id="pingd-constraint-rule" \
>>>>>>>     -inf: not_defined pingd or pingd lte 0
>>>>>>>
>>>>>>> That way, only a host that definitely has _no_ connectivity carries a
>>>>>>> -INF score for that resource group. And I believe that is what you
>>>>>>> really want, rather than take the actual ping score as a placement
>>>>>>> weight (your "best connectivity" approach).
>>>>>>>
>>>>>>> Just my 2 cents, though.
>>>>>>>
>>>>>> Even though this approach was recommended many times, there is a
>>>>>> problem
>>>>>> with it.
>>>>>> What if all nodes for some reason are not able to ping ?
>>>>>> This rule would cause a resource to be brought down completely, whereas
>>>>>> if you use "best connectivity" approach it will stay up where it was
>>>>>> before
>>>>>> network failed.
>>>>> If the outside[1] world can't reach the cluster, is there much benefit
>>>>> in having it running?
>>>>>
>>>>> [1] Substitute "outside" for wherever your users are, hopefully you
>>>>> picked a ping node from the same area.
>>>>>
>>>>>> Vadym
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs:
>>>>>>
>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs:
>>>>>
>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs:
>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker