[ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0

Ken Gaillot kgaillot at redhat.com
Mon May 20 10:31:20 EDT 2019


On Mon, 2019-05-20 at 15:29 +0200, Ulrich Windl wrote:
> What worries me is "Rejecting name for unique".

Trace messages are often not user-friendly. The rejecting/accepting is
nothing to be concerned about; it just refers to which parameters are
being used to calculate that particular hash.

Pacemaker calculates up to three hashes.

The first is a hash of all the resource parameters, to detect if
anything changed; this is stored as "op-digest" in the CIB status
entries.

If the resource is reloadable, another hash is calculated with just the
parameters marked as unique=1 (which means they can't be reloaded). Any
change in these parameters requires a full restart. This one is "op-
restart-digest".

Finally, if the resource has sensitive parameters like passwords, a
hash of everything but those parameters is stored as "op-secure-
digest". This one is only used when simulating CIBs grabbed from
cluster reports, which have sensitive info scrubbed.

>From what's described here, the op-restart-digest is changing every
time, which means something is going wrong in the hash comparison
(since the definition is not really changing).

The log that stands out to me is:

trace   May 18 23:02:49 calculate_xml_digest_v1(83):0: digest:source   <parameters id="0"/>


The id is the resource name, which isn't "0". That leads me to:

trace   May 18 23:02:49 svc_read_output(87):0: Got 499 chars: <parameter name="id" unique="1" required="1">


which is the likely source of the problem. "id" is a pacemaker
property, not an OCF resource parameter. It shouldn't be in the
resource agent meta-data. Remove that, and I bet it will be OK.

BTW the "every 15 minutes" would be the cluster-recheck-interval
cluster property.

> > > > Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu> schrieb am
> > > > 20.05.2019
> 
> um
> 14:37 in Nachricht <
> alpine.DEB.2.20.1905201428050.24467 at blackhole.kfki.hu>:
> > On Sun, 19 May 2019, Kadlecsik József wrote:
> > 
> > > On Sat, 18 May 2019, Kadlecsik József wrote:
> > > 
> > > > On Sat, 18 May 2019, Kadlecsik József wrote:
> > > > 
> > > > > On Sat, 18 May 2019, Andrei Borzenkov wrote:
> > > > > 
> > > > > > 18.05.2019 18:34, Kadlecsik József пишет:
> > > > > > > We have a resource agent which creates IP tunnels. In
> > > > > > > spite of the
> > > > > > > configuration setting
> > > > > > > 
> > > > > > > primitive tunnel-eduroam ocf:local:tunnel \
> > > > > > >         params ....
> > > > > > >         op start timeout=120s interval=0 \
> > > > > > >         op stop timeout=300s interval=0 \
> > > > > > >         op monitor timeout=30s interval=30s depth=0 \
> > > > > > >         meta target-role=Started
> > > > > > > order bifur-eduroam-ipv4-before-tunnel-eduroam \
> > > > > > > 	Mandatory: bifur-eduroam-ipv4 tunnel-eduroam
> > > > > > > colocation tunnel-eduroam-on-bifur-eduroam-ipv4 inf:
> > > > > > > tunnel-eduroam
> 
> \
> > > > > > > 	bifur-eduroam-ipv4:Started
> > > > > > > 
> > > > > > > the resource is restarted again and again. According to
> > > > > > > the debug
> 
> logs:
> > > > > > > 
> > > > > > >     Parameters to tunnel-eduroam_start_0 on bifur1
> > > > > > > changed: was 
> > > > > > > 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> > 
> > f2317cad3d54cec5d7d7aa7d0bf35cf8 
> > > > > > > (restart:3.0.11) 0:0;48:3:0:73562fd6-1fe2-4930-8c6e-
> > > > > > > 5953b82ebb32
> > > > > > 
> > > > > > This means that instance attributes changed in this case
> > > > > > pacemaker
> > > > > > restarts resource to apply new values. Turning on trace
> > > > > > level
> 
> hopefully
> > > > > > will show what exactly is being changed. You can also dump
> > > > > > CIB
> 
> before
> > > > > > and after restart to compare current information.
> > > > > 
> > > > > The strange thing is that the new value seems never be
> > > > > stored. Just the
> > > > > "was-now" part from the log lines:
> > > > > 
> > > > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> > 
> > f2317cad3d54cec5d7d7aa7d0bf35cf8
> > > > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> > 
> > f2317cad3d54cec5d7d7aa7d0bf35cf8
> > > > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> > 
> > f2317cad3d54cec5d7d7aa7d0bf35cf8
> > > > > ...
> > > > > 
> > > > > However, after issuing "cibadmin --query --local", the whole
> > > > > flipping 
> > > > > stopped! :-) Thanks!
> > > > 
> > > > No, I was wrong - it still repeats every ~15mins. The diff
> > > > between two
> 
> cib 
> > > > xml dumps doesn't say much to me - I'm going to enable tracing.
> > > 
> > > I have attached the trace file created according to 
> > > http://blog.clusterlabs.org/blog/2013/pacemaker-logging.
> > > 
> > > What looks strange to me is that build_parameter_list() first
> > > rejects
> > > attributes, then accepts them:
> > > 
> > > trace   May 18 23:02:49 build_operation_update(787):0: Including
> > > additional
> > digests for ocf::local:tunnel
> > > trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting
> > > name for 
> > 
> > unique
> > > trace   May 18 23:02:49 build_parameter_list(614):0: Attr id is
> > > unique
> > > trace   May 18 23:02:49 build_parameter_list(632):0: Adding attr
> > > id=0 to
> 
> the 
> > xml result
> > > trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting
> > > src_ip for 
> > 
> > unique
> > > trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting
> > > dst_ip for 
> > 
> > unique
> > > ...
> > > trace   May 18 23:02:49 calculate_xml_digest_v1(71):0: Sorting
> > > xml...
> > > trace   May 18 23:02:49 calculate_xml_digest_v1(73):0: Done
> > > trace   May 18 23:02:49 crm_md5sum(2102):0: Beginning digest of
> > > 22 bytes
> > > trace   May 18 23:02:49 crm_md5sum(2110):0: Digest 
> > 
> > 94afff0ff7cfc62f7cb1d5bf5b4d83aa.
> > > 
> > > and then:
> > > 
> > > trace   May 18 23:02:49 calculate_xml_digest_v1(83):0:
> > > digest:source   
> > 
> > <parameters id="0"/>
> > > trace   May 18 23:02:49 append_restart_list(693):0: tunnel-
> > > eduroam: 
> > 
> > 94afff0ff7cfc62f7cb1d5bf5b4d83aa,  id 
> > > trace   May 18 23:02:49 append_restart_list(694):0: restart
> > > digest source  
> > <parameters id="0"/>
> > > trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting
> > > name for 
> > 
> > private
> > > trace   May 18 23:02:49 build_parameter_list(625):0: Inverting
> > > name match 
> > 
> > for private xml
> > > trace   May 18 23:02:49 build_parameter_list(632):0: Adding attr 
> > 
> > name=eduroam IPv4 tunnel to the xml result
> > > trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting id
> > > for 
> > 
> > private
> > > trace   May 18 23:02:49 build_parameter_list(625):0: Inverting id
> > > match for
> > private xml
> > > trace   May 18 23:02:49 build_parameter_list(632):0: Adding attr
> > > id=0 to
> 
> the 
> > xml result
> > > ....
> > > 
> > > By the way, it's debian stretch with pacemaker 1.1.16-1.
> > 
> > I have double and triple checked the agent and it seems just a
> > normal, 
> > working agent.
> > 
> > The agent accepts the reload operation, it is advertised in the
> > actions 
> > section of its metadata, there are parameters with unique set to 0
> > and 
> > still stop/start is called instead of reload. (I could even live
> > with 
> > reload instead of start/stop in every 15 mins).
> > 
> > As a desperate attempt, I deleted the resource and re-added and it
> > of 
> > course did not help.
> > 
> > I also created the attached trace file during creating the resource
> > in the 
> > hope that it could help find the reason of the permanent
> > stop/start.
> > 
> > Best regards,
> > Jozsef
> > --
> > E-mail : kadlecsik.jozsef at wigner.mta.hu 
> > PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt 
> > Address: Wigner Research Centre for Physics, Hungarian Academy of
> > Sciences
> >          H-1525 Budapest 114, POB. 49, Hungary
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list