[ClusterLabs] Pacemaker resource parameter reload confusion

Ken Gaillot kgaillot at redhat.com
Tue Oct 31 16:35:58 EDT 2017


On Tue, 2017-10-31 at 18:44 +0100, Ferenc Wágner wrote:
> Ken Gaillot <kgaillot at redhat.com> writes:
> 
> > The pe-input is indeed entirely sufficient.
> > 
> > I forgot to check why the reload was not possible in this case. It
> > turns out it is this:
> > 
> >    trace: check_action_definition:      Resource vm-alder doesn't
> > know
> > how to reload
> > 
> > Does the resource agent implement the "reload" action and advertise
> > it
> > in the <actions> section of its metadata?
> 
> Absolutely, I use this operation routinely.
> 
> $ /usr/sbin/crm_resource --show-metadata=ocf:niif:TransientDomain
> [...]
> <actions>
> <action name="start"        timeout="10" />
> <action name="stop"         timeout="60" />
> <action name="monitor"      timeout="10" interval="30" />
> <action name="migrate_to"   timeout="120" />
> <action name="migrate_from" timeout="5" />
> <action name="meta-data"    timeout="5" />
> <action name="validate-all" timeout="5" />
> <action name="reload"       timeout="5" />
> </actions>
> </resource-agent>
> 
> And the implementation is just a no-op.
> 
> vm-alder is based on a template, just like all other VMs:
> 
> <primitive id="vm-alder" class="ocf" provider="niif"
> type="TransientDomain">
>   <instance_attributes id="vm-template-instance_attributes">
>     <nvpair id="vm-template-instance_attributes-migr_timeout"
> name="migr_timeout" value="120"/>
>     [...]
>   </instance_attributes>
>   [...]
>   <instance_attributes id="vm-alder-instance_attributes">
>     <nvpair id="vm-alder-instance_attributes-migr_timeout"
> name="migr_timeout" value="10"/>
>     [...]
>     <nvpair id="vm-alder-instance_attributes-admins" name="admins"
> value="kissg wferi"/>
>   </instance_attributes>
>   <operations>
>     <op id="vm-alder-migrate_to-0" interval="0" name="migrate_to"
> timeout="1500" record-pending="true"/>
>     <op id="vm-alder-stop-0" interval="0" name="stop" timeout="120"
> record-pending="true"/>
>     <op id="vm-template-migrate_from-0" interval="0"
> name="migrate_from" timeout="20"/>
>     <op id="vm-template-monitor-60" interval="60" name="monitor"
> timeout="20"/>
>     <op id="vm-template-start-0" interval="0" name="start"
> timeout="120" record-pending="true"/>
>   </operations>
>   [...]
> </primitive>
> 
> I wonder why it wouldn't know how to reload.  How is that visible in
> the
> pe-input file?  I'd check the other resources...

When an operation completes, a history entry (<lrm_rsc_op>) is added to
the pe-input file. If the agent supports reload, the entry will include
op-force-restart and op-restart-digest fields. Now I see those are
present in the vm-alder_last_0 entry, so agent support isn't the issue.

However, the operation is recorded as a *failed* probe (i.e. the
resource was running where it wasn't expected). This gets recorded as a
separate vm-alder_last_failure_0 entry, which does not get the special
fields. It looks to me like this failure entry is forcing the restart.
That would be a good idea if it's an actual failure; if we find a
resource unexpectedly running, we don't know how it was started, so a
full restart makes sense. 

However, I'm guessing it may not have been a real error, but a resource
cleanup. A cleanup clears the history so the resource is re-probed, and
I suspect that re-probe is what got recorded here as a failure. Does
that match what actually happened?
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list