[ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0

Mon May 20 17:15:11 EDT 2019

Hi,

On Mon, 20 May 2019, Ken Gaillot wrote:

> On Mon, 2019-05-20 at 15:29 +0200, Ulrich Windl wrote:
> > What worries me is "Rejecting name for unique".
> 
> Trace messages are often not user-friendly. The rejecting/accepting is 
> nothing to be concerned about; it just refers to which parameters are 
> being used to calculate that particular hash.
>
> Pacemaker calculates up to three hashes.
> 
> The first is a hash of all the resource parameters, to detect if
> anything changed; this is stored as "op-digest" in the CIB status
> entries.
> 
> If the resource is reloadable, another hash is calculated with just the
> parameters marked as unique=1 (which means they can't be reloaded). Any
> change in these parameters requires a full restart. This one is "op-
> restart-digest".
> 
> Finally, if the resource has sensitive parameters like passwords, a
> hash of everything but those parameters is stored as "op-secure-
> digest". This one is only used when simulating CIBs grabbed from
> cluster reports, which have sensitive info scrubbed.

Thanks for the explanation! It seemed very cryptic in the trace messages 
that different hashes were calculated with differen parameter lists.

> From what's described here, the op-restart-digest is changing every
> time, which means something is going wrong in the hash comparison
> (since the definition is not really changing).
> 
> The log that stands out to me is:
> 
> trace   May 18 23:02:49 calculate_xml_digest_v1(83):0: digest:source   <parameters id="0"/>
> 
> The id is the resource name, which isn't "0". That leads me to:
> 
> trace   May 18 23:02:49 svc_read_output(87):0: Got 499 chars: <parameter name="id" unique="1" required="1">
> 
> which is the likely source of the problem. "id" is a pacemaker property, 
> not an OCF resource parameter. It shouldn't be in the resource agent 
> meta-data. Remove that, and I bet it will be OK.

I renamed the parameter to "tunnel_id", redefined the resources and 
started them again.

> BTW the "every 15 minutes" would be the cluster-recheck-interval
> cluster property.

I have waited more than half an hour and there are no more 
stopping/starting of the resources. :-) I haven't thought that "id" is 
reserved as parameter name.

Thank you!

Best regards,
Jozsef
--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
         H-1525 Budapest 114, POB. 49, Hungary