[ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon May 20 09:29:21 EDT 2019


What worries me is "Rejecting name for unique".

>>> Kadlecsik József <kadlecsik.jozsef at wigner.mta.hu> schrieb am 20.05.2019
um
14:37 in Nachricht <alpine.DEB.2.20.1905201428050.24467 at blackhole.kfki.hu>:
> On Sun, 19 May 2019, Kadlecsik József wrote:
> 
>> On Sat, 18 May 2019, Kadlecsik József wrote:
>> 
>> > On Sat, 18 May 2019, Kadlecsik József wrote:
>> > 
>> > > On Sat, 18 May 2019, Andrei Borzenkov wrote:
>> > > 
>> > > > 18.05.2019 18:34, Kadlecsik József пишет:
>> > > 
>> > > > > We have a resource agent which creates IP tunnels. In spite of the

>> > > > > configuration setting
>> > > > > 
>> > > > > primitive tunnel-eduroam ocf:local:tunnel \
>> > > > >         params ....
>> > > > >         op start timeout=120s interval=0 \
>> > > > >         op stop timeout=300s interval=0 \
>> > > > >         op monitor timeout=30s interval=30s depth=0 \
>> > > > >         meta target-role=Started
>> > > > > order bifur-eduroam-ipv4-before-tunnel-eduroam \
>> > > > > 	Mandatory: bifur-eduroam-ipv4 tunnel-eduroam
>> > > > > colocation tunnel-eduroam-on-bifur-eduroam-ipv4 inf: tunnel-eduroam
\
>> > > > > 	bifur-eduroam-ipv4:Started
>> > > > > 
>> > > > > the resource is restarted again and again. According to the debug
logs:
>> > > > > 
>> > > > >     Parameters to tunnel-eduroam_start_0 on bifur1 changed: was 
>> > > > > 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> f2317cad3d54cec5d7d7aa7d0bf35cf8 
>> > > > > (restart:3.0.11) 0:0;48:3:0:73562fd6-1fe2-4930-8c6e-5953b82ebb32
>> > > > 
>> > > > This means that instance attributes changed in this case pacemaker
>> > > > restarts resource to apply new values. Turning on trace level
hopefully
>> > > > will show what exactly is being changed. You can also dump CIB
before
>> > > > and after restart to compare current information.
>> > > 
>> > > The strange thing is that the new value seems never be stored. Just the

>> > > "was-now" part from the log lines:
>> > > 
>> > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> f2317cad3d54cec5d7d7aa7d0bf35cf8
>> > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> f2317cad3d54cec5d7d7aa7d0bf35cf8
>> > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now 
> f2317cad3d54cec5d7d7aa7d0bf35cf8
>> > > ...
>> > > 
>> > > However, after issuing "cibadmin --query --local", the whole flipping 
>> > > stopped! :-) Thanks!
>> > 
>> > No, I was wrong - it still repeats every ~15mins. The diff between two
cib 
>> > xml dumps doesn't say much to me - I'm going to enable tracing.
>> 
>> I have attached the trace file created according to 
>> http://blog.clusterlabs.org/blog/2013/pacemaker-logging.
>> 
>> What looks strange to me is that build_parameter_list() first rejects
>> attributes, then accepts them:
>> 
>> trace   May 18 23:02:49 build_operation_update(787):0: Including additional

> digests for ocf::local:tunnel
>> trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting name for 
> unique
>> trace   May 18 23:02:49 build_parameter_list(614):0: Attr id is unique
>> trace   May 18 23:02:49 build_parameter_list(632):0: Adding attr id=0 to
the 
> xml result
>> trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting src_ip for 
> unique
>> trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting dst_ip for 
> unique
>> ...
>> trace   May 18 23:02:49 calculate_xml_digest_v1(71):0: Sorting xml...
>> trace   May 18 23:02:49 calculate_xml_digest_v1(73):0: Done
>> trace   May 18 23:02:49 crm_md5sum(2102):0: Beginning digest of 22 bytes
>> trace   May 18 23:02:49 crm_md5sum(2110):0: Digest 
> 94afff0ff7cfc62f7cb1d5bf5b4d83aa.
>> 
>> and then:
>> 
>> trace   May 18 23:02:49 calculate_xml_digest_v1(83):0: digest:source   
> <parameters id="0"/>
>> trace   May 18 23:02:49 append_restart_list(693):0: tunnel-eduroam: 
> 94afff0ff7cfc62f7cb1d5bf5b4d83aa,  id 
>> trace   May 18 23:02:49 append_restart_list(694):0: restart digest source  

> <parameters id="0"/>
>> trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting name for 
> private
>> trace   May 18 23:02:49 build_parameter_list(625):0: Inverting name match 
> for private xml
>> trace   May 18 23:02:49 build_parameter_list(632):0: Adding attr 
> name=eduroam IPv4 tunnel to the xml result
>> trace   May 18 23:02:49 build_parameter_list(621):0: Rejecting id for 
> private
>> trace   May 18 23:02:49 build_parameter_list(625):0: Inverting id match for

> private xml
>> trace   May 18 23:02:49 build_parameter_list(632):0: Adding attr id=0 to
the 
> xml result
>> ....
>> 
>> By the way, it's debian stretch with pacemaker 1.1.16-1.
> 
> I have double and triple checked the agent and it seems just a normal, 
> working agent.
> 
> The agent accepts the reload operation, it is advertised in the actions 
> section of its metadata, there are parameters with unique set to 0 and 
> still stop/start is called instead of reload. (I could even live with 
> reload instead of start/stop in every 15 mins).
> 
> As a desperate attempt, I deleted the resource and re-added and it of 
> course did not help.
> 
> I also created the attached trace file during creating the resource in the 
> hope that it could help find the reason of the permanent stop/start.
> 
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.mta.hu 
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt 
> Address: Wigner Research Centre for Physics, Hungarian Academy of Sciences
>          H-1525 Budapest 114, POB. 49, Hungary





More information about the Users mailing list