[ClusterLabs] Antw: Re: Antw: Re: Constant stop/start of resource in spite of interval=0

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue May 21 04:03:48 EDT 2019


So maybe the original defective RA would be valuable for debugging the issue.
I guess the RA was invalid in some way that wasn't detected or handled


>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 21.05.2019 um 09:13 in
Nachricht <bd253405-e98c-251e-e908-1431d6d65bde at gmail.com>:
> 21.05.2019 0:46, Ken Gaillot пишет:
>>>> From what's described here, the op-restart-digest is changing every
>>>> time, which means something is going wrong in the hash comparison
>>>> (since the definition is not really changing).
>>>> The log that stands out to me is:
>>>> trace   May 18 23:02:49 calculate_xml_digest_v1(83):0:
>>>> digest:source   <parameters id="0"/>
>>>> The id is the resource name, which isn't "0". That leads me to:
>>>> trace   May 18 23:02:49 svc_read_output(87):0: Got 499 chars:
>>>> <parameter name="id" unique="1" required="1">
>>>> which is the likely source of the problem. "id" is a pacemaker
>>>> property, 
>>>> not an OCF resource parameter. It shouldn't be in the resource
>>>> agent 
>>>> meta-data. Remove that, and I bet it will be OK.
>>> I renamed the parameter to "tunnel_id", redefined the resources and 
>>> started them again.
>>>> BTW the "every 15 minutes" would be the cluster-recheck-interval
>>>> cluster property.
>>> I have waited more than half an hour and there are no more 
>>> stopping/starting of the resources. :-) I haven't thought that "id"
>>> is 
>>> reserved as parameter name.
>> It isn't, by the OCF standard. :) This could be considered a pacemaker
>> bug; pacemaker should be able to distinguish its own "id" from an OCF
>> parameter "id", but it currently can't.
> I'm really baffled by this explanation. I tried to create resource with
> "id" unique instance property and I do not observe this problem. No
> restarts.
> As none of traces provided captures of the moment of restart-digest
> mismatch I also am not sure where to look. I do not see "id" being
> treated anyway specially in the code.
> Somewhat interesting is that restart digest source in two traces is
> different:
> bor at bor-Latitude-E5450:~$ grep -w 'restart digest' /tmp/trace.log*
> /tmp/trace.log:trace   May 18 23:02:49 append_restart_list(694):0:
> restart digest source   <parameters id="0"/>
> /tmp/trace.log:trace   May 18 23:02:50 append_restart_list(694):0:
> restart digest source   <parameters id="1"/>
> /tmp/trace.log.2:trace   May 20 13:56:16 append_restart_list(694):0:
> restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
> /tmp/trace.log.2:trace   May 20 13:56:17 append_restart_list(694):0:
> restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
> /tmp/trace.log.2:trace   May 20 13:56:18 append_restart_list(694):0:
> restart digest source   <parameters name="Wigner guest IPv4 tunnel"
> bor at bor-Latitude-E5450:~$
> In one case it does not include "name" parameter. Whether configuration
> was changed in between is unknown, we never have seen full RA metadata
> in each case nor full resource definition so ...
> My hunch is that "id" is red herring and something else has changed when
> resource definition was edited. If I'm wrong I appreciate pointer to
> code where "id" is mishandled.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> ClusterLabs home: https://www.clusterlabs.org/ 

More information about the Users mailing list