[Pacemaker] master/slave resource does not stop (tries start repeatedly)

Andrew Beekhof andrew at beekhof.net
Tue Sep 11 11:17:09 UTC 2012


On Tue, Sep 11, 2012 at 9:13 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> Yikes!
>
> Fixed in:
>    https://github.com/beekhof/pacemaker/commit/7d098ce

That link should have been:

    https://github.com/beekhof/pacemaker/commit/c1f409baaaf388d03f6124ec0d9da440445c4a23

>
> On Fri, Sep 7, 2012 at 7:49 PM, Kazunori INOUE
> <inouekazu at intellilink.co.jp> wrote:
>> Hi,
>>
>> I am using Pacemaker-1.1.
>> - ClusterLabs/pacemaker : 872a2f1af1 (Sep 07)
>>
>> Though a monitor of master resource fails and there is no node which
>> the master/slave resource can run, the master/slave resource does not stop.
>>
>> [test case]
>> 1. use StatefulRA which set on-fail="restart" of monitor and
>>    migration-threshold is 1.
>>
>>    # crm_mon
>>
>>    Online: [ vm5 vm6 ]
>>
>>     Master/Slave Set: msAP [prmAP]
>>         Masters: [ vm5 ]
>>         Slaves: [ vm6 ]
>>
>> 2. let the master resource on vm5 fail, and move it to vm6.
>>
>>    Online: [ vm5 vm6 ]
>>
>>     Master/Slave Set: msAP [prmAP]
>>         Masters: [ vm6 ]
>>         Stopped: [ prmAP:1 ]
>>
>>    Failed actions:
>>        prmAP_monitor_10000 (node=vm5, call=14, rc=1, status=complete): unknown error
>>
>> 3. let the master resource on vm6 fail again, then
>>    the master/slave resource tries start repeatedly.
>>    the state of following (a) and (b) is repeated.
>>
>>   (a)
>>    Online: [ vm5 vm6 ]
>>
>>
>>    Failed actions:
>>        prmAP_monitor_10000 (node=vm5, call=14, rc=1, status=complete): unknown error
>>        prmAP_monitor_10000 (node=vm6, call=20, rc=1, status=complete): unknown error
>>
>>   (b)
>>    Online: [ vm5 vm6 ]
>>
>>     Master/Slave Set: msAP [prmAP]
>>         Slaves: [ vm5 vm6 ]
>>
>>    Failed actions:
>>        prmAP_monitor_10000 (node=vm5, call=14, rc=1, status=complete): unknown error
>>        prmAP_monitor_10000 (node=vm6, call=20, rc=1, status=complete): unknown error
>>
>> # grep -e run_graph: -e common_apply_stickiness: -e LogActions: ha-log
>>
>>>> after the master resource on vm5 failed
>> Sep  7 16:06:03 vm5 pengine[23199]:   notice: LogActions: Recover prmAP:0       (Master vm5)
>> Sep  7 16:06:03 vm5 crmd[23200]:   notice: run_graph: Transition 4 (Complete=3, Pending=0, Fired=0, Skipped=8, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-input-4.bz2): Stopped
>> Sep  7 16:06:03 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:03 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:03 vm5 pengine[23199]:   notice: LogActions: Stop    prmAP:0       (vm5)
>> Sep  7 16:06:03 vm5 pengine[23199]:   notice: LogActions: Promote prmAP:1       (Slave -> Master vm6)
>> Sep  7 16:06:03 vm5 crmd[23200]:   notice: run_graph: Transition 5 (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-5.bz2): Stopped
>> Sep  7 16:06:03 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:03 vm5 pengine[23199]:   notice: LogActions: Promote prmAP:0       (Slave -> Master vm6)
>> Sep  7 16:06:03 vm5 crmd[23200]:   notice: run_graph: Transition 6 (Complete=3, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-6.bz2): Stopped
>> Sep  7 16:06:03 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:03 vm5 crmd[23200]:   notice: run_graph: Transition 7 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-7.bz2): Complete
>>
>>>> after the master resource on vm6 failed
>> Sep  7 16:06:33 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:33 vm5 pengine[23199]:   notice: LogActions: Recover prmAP:0       (Master vm6)
>> Sep  7 16:06:34 vm5 crmd[23200]:   notice: run_graph: Transition 8 (Complete=3, Pending=0, Fired=0, Skipped=8, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-input-8.bz2): Stopped
>> Sep  7 16:06:34 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:34 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm6 after 1 failures (max=1)
>> Sep  7 16:06:34 vm5 pengine[23199]:   notice: LogActions: Stop    prmAP:0       (vm6)
>> Sep  7 16:06:34 vm5 crmd[23200]:   notice: run_graph: Transition 9 (Complete=3, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-9.bz2): Stopped
>> Sep  7 16:06:34 vm5 pengine[23199]:   notice: LogActions: Start   prmAP:0       (vm5)
>> Sep  7 16:06:34 vm5 pengine[23199]:   notice: LogActions: Promote prmAP:0       (Stopped -> Master vm5)
>> Sep  7 16:06:34 vm5 pengine[23199]:   notice: LogActions: Start   prmAP:1       (vm6)
>> Sep  7 16:06:35 vm5 crmd[23200]:   notice: run_graph: Transition 10 (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-10.bz2): Stopped
>> Sep  7 16:06:35 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:35 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm5 after 1 failures (max=1)
>> Sep  7 16:06:35 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm6 after 1 failures (max=1)
>> Sep  7 16:06:35 vm5 pengine[23199]:  warning: common_apply_stickiness: Forcing msAP away from vm6 after 1 failures (max=1)
>> Sep  7 16:06:35 vm5 pengine[23199]:   notice: LogActions: Stop    prmAP:0       (vm5)
>> Sep  7 16:06:35 vm5 pengine[23199]:   notice: LogActions: Stop    prmAP:1       (vm6)
>> Sep  7 16:06:35 vm5 crmd[23200]:   notice: run_graph: Transition 11 (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-11.bz2): Stopped
>> Sep  7 16:06:35 vm5 pengine[23199]:   notice: LogActions: Start   prmAP:0       (vm5)
>> Sep  7 16:06:35 vm5 pengine[23199]:   notice: LogActions: Promote prmAP:0       (Stopped -> Master vm5)
>> Sep  7 16:06:35 vm5 pengine[23199]:   notice: LogActions: Start   prmAP:1       (vm6)
>> Sep  7 16:06:35 vm5 crmd[23200]:   notice: run_graph: Transition 12 (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-12.bz2): Stopped
>>  :
>>
>> Is it a known issue?
>>
>> Best Regards,
>> Kazunori INOUE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>




More information about the Pacemaker mailing list