[ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0

Andrei Borzenkov arvidjaar at gmail.com
Tue May 21 03:13:09 EDT 2019


21.05.2019 0:46, Ken Gaillot пишет:
>>  
>>> From what's described here, the op-restart-digest is changing every
>>> time, which means something is going wrong in the hash comparison
>>> (since the definition is not really changing).
>>>
>>> The log that stands out to me is:
>>>
>>> trace   May 18 23:02:49 calculate_xml_digest_v1(83):0:
>>> digest:source   <parameters id="0"/>
>>>
>>> The id is the resource name, which isn't "0". That leads me to:
>>>
>>> trace   May 18 23:02:49 svc_read_output(87):0: Got 499 chars:
>>> <parameter name="id" unique="1" required="1">
>>>
>>> which is the likely source of the problem. "id" is a pacemaker
>>> property, 
>>> not an OCF resource parameter. It shouldn't be in the resource
>>> agent 
>>> meta-data. Remove that, and I bet it will be OK.
>>
>> I renamed the parameter to "tunnel_id", redefined the resources and 
>> started them again.
>>  
>>> BTW the "every 15 minutes" would be the cluster-recheck-interval
>>> cluster property.
>>
>> I have waited more than half an hour and there are no more 
>> stopping/starting of the resources. :-) I haven't thought that "id"
>> is 
>> reserved as parameter name.
> 
> It isn't, by the OCF standard. :) This could be considered a pacemaker
> bug; pacemaker should be able to distinguish its own "id" from an OCF
> parameter "id", but it currently can't.
> 


I'm really baffled by this explanation. I tried to create resource with
"id" unique instance property and I do not observe this problem. No
restarts.

As none of traces provided captures of the moment of restart-digest
mismatch I also am not sure where to look. I do not see "id" being
treated anyway specially in the code.

Somewhat interesting is that restart digest source in two traces is
different:

bor at bor-Latitude-E5450:~$ grep -w 'restart digest' /tmp/trace.log*
/tmp/trace.log:trace   May 18 23:02:49 append_restart_list(694):0:
restart digest source   <parameters id="0"/>
/tmp/trace.log:trace   May 18 23:02:50 append_restart_list(694):0:
restart digest source   <parameters id="1"/>
/tmp/trace.log.2:trace   May 20 13:56:16 append_restart_list(694):0:
restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
/tmp/trace.log.2:trace   May 20 13:56:17 append_restart_list(694):0:
restart digest source   <parameters name="eduroam IPv4 tunnel" id="0"/>
/tmp/trace.log.2:trace   May 20 13:56:18 append_restart_list(694):0:
restart digest source   <parameters name="Wigner guest IPv4 tunnel" id="1"/>
bor at bor-Latitude-E5450:~$

In one case it does not include "name" parameter. Whether configuration
was changed in between is unknown, we never have seen full RA metadata
in each case nor full resource definition so ...

My hunch is that "id" is red herring and something else has changed when
resource definition was edited. If I'm wrong I appreciate pointer to
code where "id" is mishandled.


More information about the Users mailing list