[Pacemaker] "probe" operations always use cluster default operation timeout

Thu Nov 18 07:54:33 EST 2010

Hi Tim,

On 11/17/2010 9:33 PM, Tim Serong wrote:
> Hi Ron,
>
> On 11/18/2010 at 11:26 AM, Ron Kerry <rkerry at sgi.com> wrote:
>  > I have noted a problem that exists in both SLE11-HAE and SLE11-HAE-SP1
>  > distributions with the
>  > "probe" operation that takes place when openais is first started on a node
>  > to determine whether a
>  > resource is actively running or not.
>  >
>  > Nov 17 17:47:07 gto2 lrmd: [13475]: debug: on_msg_perform_op: add an
>  > operation operation monitor[2]
>  > on ocf::cxfs::CXFS for client 13478, its parameters:
>  > crm_feature_set=[3.0.2]
>  > volnames=[dmfhome,dmfjrnls,dmfspool,dmftmp,diskmsp,data]
>  > CRM_meta_timeout=[20000] to the operation list.
>  > Nov 17 17:47:07 gto2 corosync[13452]: [TOTEM ] mcasted message added to
>  > pending queue
>  > Nov 17 17:47:07 gto2 crmd: [13478]: info: te_rsc_command: Initiating action
>  > 12: monitor
>  > CXFS_monitor_0 on gto3
>  > Nov 17 17:47:07 gto2 lrmd: [13475]: info: rsc:CXFS:2: probe
>  >
>  > Note that the timeout for this operation is 20s (20000ms). Note also that it
>  > is the monitor
>  > operation for the resource that is actually called. The monitor operation
>  > timeout for this resource
>  > is set to 60s. Even manually defining a "probe" operation for the resource
>  > with a longer timeout is
>  > not effective. The timeout that is being used for this operation is the
>  > cluster default operation
>  > timeout.
>
> A probe is a special case of the monitor op, with an interval of 0.
> Try configuring it like this:
>
> primitive CXFS ocf:sgi:cxfs \
> op monitor interval="60s" timeout="60s" \
> op start timeout="600s" \
> op stop timeout="600s" \
> op monitor interval="0" timeout="600s"
>
> The timeout of 600s on the monitor op with the interval of zero should
> thus be used when doing the probe. The timeout of 60s should be used
> on the recurring monitor op with the 60s interval.
>

This works like a charm!

Nov 18 06:27:36 prod lrmd: [4565]: debug: on_msg_perform_op: add an operation
operation monitor[2] on ocf::cxfs::CXFS for client 4568, its parameters:
CRM_meta_op_target_rc=[7] CRM_meta_start_delay=[0]
volnames=[lun3s0,lun3s1,lun3s2,lun3s3,lun3s4,lun0s0,lun0s1,lun2s0]
CRM_meta_timeout=[600000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor]  to
the operation list.

The probe operation timeout is 600s even though my cluster default operation timeout is set to 20s.

Thanks again!   - Ron

-- 

Ron Kerry         rkerry at sgi.com