[Pacemaker] on-fail is not effective

Tue Apr 10 08:24:07 EDT 2012

On Sat, Apr 7, 2012 at 7:19 AM, David Vossel <dvossel at redhat.com> wrote:
> ----- Original Message -----
>> From: "Kazunori INOUE" <inouekazu at intellilink.co.jp>
>> To: "pacemaker at oss" <pacemaker at oss.clusterlabs.org>
>> Cc: koichi at intellilink.co.jp
>> Sent: Thursday, April 5, 2012 10:08:44 PM
>> Subject: [Pacemaker]  on-fail is not effective
>>
>> Hi,
>>
>> I am using Pacemaker-1.1 (devel:
>> 7172b7323bb72c51999ce11c6fa5d3ff0a0a4b4f).
>> The setting of "on-fail" does not become effective.
>> For example, it becomes default action("restart") even if it
>> specifies "stop".
>
> The resource is stopping, but if there is nothing to prevent the resource from starting again

The failed op itself and the preference is supposed to be sufficient.

> it will start after the stop action has completed. This is probably why 'restart' and 'stop' appear to have the same behavior.

That sounds like a bug and not the intended behaviour.

> -- Vossel
>
>> [root at vm1 ~]# crm configure show | grep -A3 "primitive prmDummy1"
>> primitive prmDummy1 ocf:pacemaker:Dummy \
>>         op start interval="0" timeout="60s" on-fail="restart" \
>>         op monitor interval="10s" timeout="60s" on-fail="stop" \
>>         op stop interval="0" timeout="60s" on-fail="block"
>> [root at vm1 ~]#
>> [root at vm1 ~]# crm_mon -f1
>> ============
>> Last updated: Fri Apr  6 10:13:14 2012
>> Last change: Fri Apr  6 10:12:42 2012 via cibadmin on vm1
>> Stack: Heartbeat
>> Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
>> with quorum
>> Version: 1.1.7-7172b73
>> 2 Nodes configured, unknown expected votes
>> 1 Resources configured.
>> ============
>>
>> Online: [ vm1 vm2 ]
>>
>>  prmDummy1      (ocf::pacemaker:Dummy): Started vm1
>>
>> Migration summary:
>> * Node vm1:
>> * Node vm2:
>> [root at vm1 ~]#
>> [root at vm1 ~]# rm -f /var/run/Dummy-prmDummy1.state
>> [root at vm1 ~]# crm_mon -f1
>> ============
>> Last updated: Fri Apr  6 10:13:33 2012
>> Last change: Fri Apr  6 10:12:42 2012 via cibadmin on vm1
>> Stack: Heartbeat
>> Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
>> with quorum
>> Version: 1.1.7-7172b73
>> 2 Nodes configured, unknown expected votes
>> 1 Resources configured.
>> ============
>>
>> Online: [ vm1 vm2 ]
>>
>>  prmDummy1      (ocf::pacemaker:Dummy): Started vm2
>>
>> Migration summary:
>> * Node vm1:
>>    prmDummy1: migration-threshold=1 fail-count=1
>> * Node vm2:
>>
>> Failed actions:
>>     prmDummy1_monitor_10000 (node=vm1, call=4, rc=7,
>>     status=complete): not running
>> [root at vm1 ~]#
>>
>> Attached gdb_pengine.log is a log of gdb at the time of monitor
>> failure.
>> Is it because the 2nd argument (variable 'key') of the
>> find_rsc_op_entry()
>> function is "prmDummy1_last_failure_0"?
>> Thereby, it seems that "on-fail" cannot be identified. (L117~L205)
>>
>> Best Regards,
>> Kazunori INOUE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org