[ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0

Mon May 20 17:46:46 EDT 2019

On Mon, 2019-05-20 at 23:15 +0200, Kadlecsik József wrote:
> Hi,
> 
> On Mon, 20 May 2019, Ken Gaillot wrote:
> 
> > On Mon, 2019-05-20 at 15:29 +0200, Ulrich Windl wrote:
> > > What worries me is "Rejecting name for unique".
> > 
> > Trace messages are often not user-friendly. The rejecting/accepting
> > is 
> > nothing to be concerned about; it just refers to which parameters
> > are 
> > being used to calculate that particular hash.
> > 
> > Pacemaker calculates up to three hashes.
> > 
> > The first is a hash of all the resource parameters, to detect if
> > anything changed; this is stored as "op-digest" in the CIB status
> > entries.
> > 
> > If the resource is reloadable, another hash is calculated with just
> > the
> > parameters marked as unique=1 (which means they can't be reloaded).
> > Any
> > change in these parameters requires a full restart. This one is
> > "op-
> > restart-digest".
> > 
> > Finally, if the resource has sensitive parameters like passwords, a
> > hash of everything but those parameters is stored as "op-secure-
> > digest". This one is only used when simulating CIBs grabbed from
> > cluster reports, which have sensitive info scrubbed.
> 
> Thanks for the explanation! It seemed very cryptic in the trace
> messages 
> that different hashes were calculated with differen parameter lists.
>  
> > From what's described here, the op-restart-digest is changing every
> > time, which means something is going wrong in the hash comparison
> > (since the definition is not really changing).
> > 
> > The log that stands out to me is:
> > 
> > trace   May 18 23:02:49 calculate_xml_digest_v1(83):0:
> > digest:source   <parameters id="0"/>
> > 
> > The id is the resource name, which isn't "0". That leads me to:
> > 
> > trace   May 18 23:02:49 svc_read_output(87):0: Got 499 chars:
> > <parameter name="id" unique="1" required="1">
> > 
> > which is the likely source of the problem. "id" is a pacemaker
> > property, 
> > not an OCF resource parameter. It shouldn't be in the resource
> > agent 
> > meta-data. Remove that, and I bet it will be OK.
> 
> I renamed the parameter to "tunnel_id", redefined the resources and 
> started them again.
>  
> > BTW the "every 15 minutes" would be the cluster-recheck-interval
> > cluster property.
> 
> I have waited more than half an hour and there are no more 
> stopping/starting of the resources. :-) I haven't thought that "id"
> is 
> reserved as parameter name.

It isn't, by the OCF standard. :) This could be considered a pacemaker
bug; pacemaker should be able to distinguish its own "id" from an OCF
parameter "id", but it currently can't.

> 
> Thank you!
> 
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics, Hungarian Academy of
> Sciences
>          H-1525 Budapest 114, POB. 49, Hungary
-- 
Ken Gaillot <kgaillot at redhat.com>