[Pacemaker] Resources not failing over, ERROR: RecurringOp: Invalid recurring action ... wth name: 'start'

Wed Jul 2 03:57:36 EDT 2014

1.1.6 is really too old
in any case, rc=5 'not installed' means we cant find an init script of that name in /etc/init.d

On 2 Jul 2014, at 2:07 pm, Vijay B <os.vbvs at gmail.com> wrote:

> Hi,
> 
> I'm puppetizing resource deployment for pacemaker and corosync, and as part of it, am creating a resource on one of three nodes of a cluster. The problem is that I'm seeing RecurringOp errors during resource creation, which are probably not allowing failover a resource. The resource creation seems to go through fine, but these recurringOp errors always result after resource creation (I'm pasting outputs of two different commands below):
> 
> 
> ***************************
> vagrant at precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$ sudo crm status
> ============
> Last updated: Wed Jul  2 03:52:30 2014
> Last change: Wed Jul  2 03:38:20 2014 via cibadmin on precise64b
> Stack: cman
> Current DC: precise64b - partition with quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 3 Nodes configured, unknown expected votes
> 3 Resources configured.
> ============
> 
> Online: [ precise64b precise64c precise64a ]
> 
>  f5-lbaas-agent-10.6.143.121_resource	(lsb:f5-lbaas-agent-10.6.143.121):	Started precise64c
>  f5-lbaas-agent-10.6.143.122_resource	(lsb:f5-lbaas-agent-10.6.143.122):	Started precise64b
>  f5-lbaas-agent-10.6.143.123_resource	(lsb:f5-lbaas-agent-10.6.143.123):	Started precise64b
> 
> Failed actions:
>     f5-lbaas-agent-10.6.143.120_resource_monitor_0 (node=precise64b, call=2, rc=5, status=complete): not installed
>     f5-lbaas-agent-10.6.143.121_resource_monitor_0 (node=precise64b, call=3, rc=5, status=complete): not installed
>     f5-lbaas-agent-10.6.143.122_resource_monitor_0 (node=precise64c, call=7, rc=5, status=complete): not installed
>     f5-lbaas-agent-10.6.143.123_resource_monitor_0 (node=precise64c, call=8, rc=5, status=complete): not installed
>     f5-lbaas-agent-10.6.143.120_resource_monitor_0 (node=precise64a, call=2, rc=5, status=complete): not installed
>     f5-lbaas-agent-10.6.143.121_resource_monitor_0 (node=precise64a, call=3, rc=5, status=complete): not installed
>     f5-lbaas-agent-10.6.143.122_resource_monitor_0 (node=precise64a, call=4, rc=5, status=complete): not installed
>     f5-lbaas-agent-10.6.143.123_resource_monitor_0 (node=precise64a, call=5, rc=5, status=complete): not installed
> vagrant at precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$ 
> 
> 
> ***************************
> 
> vagrant at precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$ sudo crm_verify -L -V
> crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.121_resource-start-10 wth name: 'start'
> crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.121_resource-stop-10 wth name: 'stop'
> crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.122_resource-start-10 wth name: 'start'
> crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.122_resource-stop-10 wth name: 'stop'
> Errors found during check: config not valid
> vagrant at precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$
> ***************************
> 
> 
> What do these errors signify? I found one email exchange on a pacemaker ML that suggested that we shouldn't be using start intervals and timeouts, and same with stop, since that would mean that pacemaker would attempt to restart the resource every x seconds, timeout every y seconds, and repeat that. (Link: http://lists.linbit.com/pipermail/drbd-user/2011-September/016938.html)
> 
> My understanding was that the start interval would apply in case of restart attempts upon detection of a resource as being down. Nevertheless, I removed these parameters and created a third resource (the first two, I created with these parameters), and I still see the same monitor related errors for the third resource (f5-lbaas-agent-10.6.143.123_resource_monitor_0) in the sudo crm status command output. I don't however understand why this resource doesn't show up in the crm_verify -L -V output.
> 
> Here are the two CLIs I use to create the resources:
> 
> sudo crm configure primitive $pmk_res_name $pmk_cont_type:$service_name op monitor interval="$mon_interval" timeout="$mon_timeout" op start interval="$start_interval" timeout="$start_timeout" op stop interval="$stop_interval" timeout="$stop_timeout
> 
> 
> sudo crm configure primitive $pmk_res_name $pmk_cont_type:$service_name op monitor interval="$mon_interval" timeout="$mon_timeout"
> 
> 
> The bottom-line is that if I halt the VM running any of these resources, the resource isn't failing over to another VM. I'm not sure what the exact cause is - any help would be greatly appreciated!
> 
> 
> Thanks,
> Regards,
> Vijay
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140702/87576684/attachment-0003.sig>