[ClusterLabs] failure-timeout not working in corosync 2.0.1

Andrei Borzenkov arvidjaar at gmail.com
Thu Apr 1 02:11:40 EDT 2021


On 01.04.2021 08:20, Andrei Borzenkov wrote:
> On 01.04.2021 00:21, Antony Stone wrote:
>> On Wednesday 31 March 2021 at 23:11:50, Reid Wahl wrote:
>>
>>> Maybe Pacemaker-1 was looser in its handling of resource meta attributes vs
>>> operation meta attributes. Good question.
>>
>> Returning to my suspicion that it's more likely me that simply did something 
>> wrong, what command can I use to find out what pacemaker thinks my cluster.cib 
>> file really means, so I can be sure it's been interpreted by pacemaker the same 
>> way as I do?
>>
> 
> What you show is not CIB. What you show is input to some high level tool
> that translates it into CIB.
> 
> By the look of it it is crmsh. So may be something changed in the way
> crmsh parses its input. Or you used different input when it worked.
> 
> Showing actual CIB XML generated in each case would certainly be helpful.
> 

And for the reasonably up-to-date crmsh we get


      <primitive id="xxx" class="ocf" provider="heartbeat" type="IPaddr2">
        <instance_attributes id="xxx-instance_attributes">
          <nvpair name="ip" value="10.0.0.5"
id="xxx-instance_attributes-ip"/>
          <nvpair name="cidr_netmask" value="24"
id="xxx-instance_attributes-cidr_netmask"/>
        </instance_attributes>
        <meta_attributes id="xxx-meta_attributes">
          <nvpair name="migration-threshold" value="3"
id="xxx-meta_attributes-migration-threshold"/>
        </meta_attributes>
        <operations>
          <op name="monitor" interval="10s" timeout="20s"
on-fail="restart" id="xxx-monitor-10s">
            <instance_attributes id="xxx-monitor-10s-instance_attributes">
              <nvpair name="failure-timeout" value="180s"
id="xxx-monitor-10s-instance_attributes-failure-timeout"/>
            </instance_attributes>
          </op>
        </operations>
      </primitive>


So crmsh puts any unrecognized operation attribute under its
instance_attributes list which also is not validated (I am not sure if
it can be). And if you look at generated high level output it is also
*not* what was specified:

	params ip=10.0.0.5 cidr_netmask=24 \
	meta migration-threshold=3 \
	op monitor interval=10s timeout=20s on-fail=restart \
	op_params failure-timeout=180s
        ^^^^^^^^^

There is no documentation how this list is possibly used. Pacemaker
explained briefly mentions that operation attributes may be specified
either directly (in op element) or as *meta_attributes* nvpair list and
values in op element take precedence. This implies that nvlist may only
contain the same attributes and so should be validated using the same rules.

And pacemaker explained does not mention instance_attributes in relation
to operation at all.

I would say it is really crmsh bug in being excessively liberal in what
it accepts and silently generating configuration that makes no sense.


More information about the Users mailing list