[ClusterLabs] Antw: Re: Antw: Re: Constant stop/start of resource in spite of interval=0

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue May 21 04:03:48 EDT 2019


Hi!

So maybe the original defective RA would be valuable for debugging the issue.
I guess the RA was invalid in some way that wasn't detected or handled
properly...

Regards,
Ulrich

>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 21.05.2019 um 09:13 in
Nachricht <bd253405-e98c-251e-e908-1431d6d65bde at gmail.com>:
> 21.05.2019 0:46, Ken Gaillot пишет:
>>>  
>>>> From what's described here, the op-restart-digest is changing every
>>>> time, which means something is going wrong in the hash comparison
>>>> (since the definition is not really changing).
>>>>
>>>> The log that stands out to me is:
>>>>
>>>> trace   May 18 23:02:49 calculate_xml_digest_v1(83):0:
>>>> digest:source   <parameters id="0"/>
>>>>
>>>> The id is the resource name, which isn't "0". That leads me to:
>>>>
>>>> trace   May 18 23:02:49 svc_read_output(87):0: Got 499 chars:
>>>> <parameter name="id" unique="1" required="1">
>>>>
>>>> which is the likely source of the problem. "id" is a pacemaker
>>>> property, 
>>>> not an OCF resource parameter. It shouldn't be in the resource
>>>> agent 
>>>> meta-data. Remove that, and I bet it will be OK.
>>>
>>> I renamed the parameter to "tunnel_id", redefined the resources and 
>>> started them again.
>>>  
>>>> BTW the "every 15 minutes" would be the cluster-recheck-interval
>>>> cluster property.
>>>
>>> I have waited more than half an hour and there are no more 
>>> stopping/starting of the resources. :-) I haven't thought that "id"
>>> is 
>>> reserved as parameter name.
>> 
>> It isn't, by the OCF standard. :) This could be considered a pacemaker
>> bug; pacemaker should be able to distinguish its own "id" from an OCF
>> parameter "id", but it currently can't.
>> 
> 
> 
> I'm really baffled by this explanation. I tried to create resource with
> "id" unique instance property and I do not observe this problem. No
> restarts.
> 
> As none of traces provided captures of the moment of restart-digest
> mismatch I also am not sure where to look. I do not see "id" being
> treated anyway specially in the code.
> 
> Somewhat interesting is that restart digest source in two traces is
> different:
> 
> bor at bor-Latitude-E5450:~$ grep -w 'restart digest' /tmp/trace.log*
> /tmp/trace.log:trace   May 18 23:02:49 append_restart_list(694):0:
> restart digest source   <parameters id="0"/>
> /tmp/trace.log:trace   May 18 23:02:50 append_restart_list(694):0:
> restart digest source   <parameters id="1"/>
> /tmp/trace.log.2:trace   May 20 13:56:16 append_restart_list(694):0:
> restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
> /tmp/trace.log.2:trace   May 20 13:56:17 append_restart_list(694):0:
> restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
> /tmp/trace.log.2:trace   May 20 13:56:18 append_restart_list(694):0:
> restart digest source   <parameters name="Wigner guest IPv4 tunnel"
id="1"/>
> bor at bor-Latitude-E5450:~$
> 
> In one case it does not include "name" parameter. Whether configuration
> was changed in between is unknown, we never have seen full RA metadata
> in each case nor full resource definition so ...
> 
> My hunch is that "id" is red herring and something else has changed when
> resource definition was edited. If I'm wrong I appreciate pointer to
> code where "id" is mishandled.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list