[ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon May 20 09:29:21 EDT 2019
What worries me is "Rejecting name for unique".
>>> Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu> schrieb am 20.05.2019
um
14:37 in Nachricht <alpine.DEB.2.20.1905201428050.24467 at blackhole.kfki.hu>:
> On Sun, 19 May 2019, Kadlecsik József wrote:
>
>> On Sat, 18 May 2019, Kadlecsik József wrote:
>>
>> > On Sat, 18 May 2019, Kadlecsik József wrote:
>> >
>> > > On Sat, 18 May 2019, Andrei Borzenkov wrote:
>> > >
>> > > > 18.05.2019 18:34, Kadlecsik József пишет:
>> > >
>> > > > > We have a resource agent which creates IP tunnels. In spite of the
>> > > > > configuration setting
>> > > > >
>> > > > > primitive tunnel-eduroam ocf:local:tunnel \
>> > > > > params ....
>> > > > > op start timeout=120s interval=0 \
>> > > > > op stop timeout=300s interval=0 \
>> > > > > op monitor timeout=30s interval=30s depth=0 \
>> > > > > meta target-role=Started
>> > > > > order bifur-eduroam-ipv4-before-tunnel-eduroam \
>> > > > > Mandatory: bifur-eduroam-ipv4 tunnel-eduroam
>> > > > > colocation tunnel-eduroam-on-bifur-eduroam-ipv4 inf: tunnel-eduroam
\
>> > > > > bifur-eduroam-ipv4:Started
>> > > > >
>> > > > > the resource is restarted again and again. According to the debug
logs:
>> > > > >
>> > > > > Parameters to tunnel-eduroam_start_0 on bifur1 changed: was
>> > > > > 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now
> f2317cad3d54cec5d7d7aa7d0bf35cf8
>> > > > > (restart:3.0.11) 0:0;48:3:0:73562fd6-1fe2-4930-8c6e-5953b82ebb32
>> > > >
>> > > > This means that instance attributes changed in this case pacemaker
>> > > > restarts resource to apply new values. Turning on trace level
hopefully
>> > > > will show what exactly is being changed. You can also dump CIB
before
>> > > > and after restart to compare current information.
>> > >
>> > > The strange thing is that the new value seems never be stored. Just the
>> > > "was-now" part from the log lines:
>> > >
>> > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now
> f2317cad3d54cec5d7d7aa7d0bf35cf8
>> > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now
> f2317cad3d54cec5d7d7aa7d0bf35cf8
>> > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now
> f2317cad3d54cec5d7d7aa7d0bf35cf8
>> > > ...
>> > >
>> > > However, after issuing "cibadmin --query --local", the whole flipping
>> > > stopped! :-) Thanks!
>> >
>> > No, I was wrong - it still repeats every ~15mins. The diff between two
cib
>> > xml dumps doesn't say much to me - I'm going to enable tracing.
>>
>> I have attached the trace file created according to
>> http://blog.clusterlabs.org/blog/2013/pacemaker-logging.
>>
>> What looks strange to me is that build_parameter_list() first rejects
>> attributes, then accepts them:
>>
>> trace May 18 23:02:49 build_operation_update(787):0: Including additional
> digests for ocf::local:tunnel
>> trace May 18 23:02:49 build_parameter_list(621):0: Rejecting name for
> unique
>> trace May 18 23:02:49 build_parameter_list(614):0: Attr id is unique
>> trace May 18 23:02:49 build_parameter_list(632):0: Adding attr id=0 to
the
> xml result
>> trace May 18 23:02:49 build_parameter_list(621):0: Rejecting src_ip for
> unique
>> trace May 18 23:02:49 build_parameter_list(621):0: Rejecting dst_ip for
> unique
>> ...
>> trace May 18 23:02:49 calculate_xml_digest_v1(71):0: Sorting xml...
>> trace May 18 23:02:49 calculate_xml_digest_v1(73):0: Done
>> trace May 18 23:02:49 crm_md5sum(2102):0: Beginning digest of 22 bytes
>> trace May 18 23:02:49 crm_md5sum(2110):0: Digest
> 94afff0ff7cfc62f7cb1d5bf5b4d83aa.
>>
>> and then:
>>
>> trace May 18 23:02:49 calculate_xml_digest_v1(83):0: digest:source
> <parameters id="0"/>
>> trace May 18 23:02:49 append_restart_list(693):0: tunnel-eduroam:
> 94afff0ff7cfc62f7cb1d5bf5b4d83aa, id
>> trace May 18 23:02:49 append_restart_list(694):0: restart digest source
> <parameters id="0"/>
>> trace May 18 23:02:49 build_parameter_list(621):0: Rejecting name for
> private
>> trace May 18 23:02:49 build_parameter_list(625):0: Inverting name match
> for private xml
>> trace May 18 23:02:49 build_parameter_list(632):0: Adding attr
> name=eduroam IPv4 tunnel to the xml result
>> trace May 18 23:02:49 build_parameter_list(621):0: Rejecting id for
> private
>> trace May 18 23:02:49 build_parameter_list(625):0: Inverting id match for
> private xml
>> trace May 18 23:02:49 build_parameter_list(632):0: Adding attr id=0 to
the
> xml result
>> ....
>>
>> By the way, it's debian stretch with pacemaker 1.1.16-1.
>
> I have double and triple checked the agent and it seems just a normal,
> working agent.
>
> The agent accepts the reload operation, it is advertised in the actions
> section of its metadata, there are parameters with unique set to 0 and
> still stop/start is called instead of reload. (I could even live with
> reload instead of start/stop in every 15 mins).
>
> As a desperate attempt, I deleted the resource and re-added and it of
> course did not help.
>
> I also created the attached trace file during creating the resource in the
> hope that it could help find the reason of the permanent stop/start.
>
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
> H-1525 Budapest 114, POB. 49, Hungary
More information about the Users
mailing list