[Pacemaker] The strange behavior of Master/Slave when it failed to demote

Andrew Beekhof andrew at beekhof.net
Mon Feb 4 20:56:37 EST 2013


On Wed, Jan 23, 2013 at 4:11 PM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi All,
>
> I registered a problem at bugzilla in place of Miss Ikeda.
>  * http://bugs.clusterlabs.org/show_bug.cgi?id=5133

Great, I'll follow up there.

>
> Best Regards,
> Hideo Yamauchi.
>
>
> --- On Thu, 2013/1/10, Junko IKEDA <tsukishima.ha at gmail.com> wrote:
>
>>
>>
>> Hi,
>>
>> I'm running Stateful RA with Pacemaker 1.0.12, and found that its demote behavior is something wrong.
>>
>> This is my configuration;
>> There is no stonith devices, and demote/stop are set as on-fail="block".
>>
>> # crm configure show
>> node $id="21c624bd-c426-43dc-9665-bbfb92054bcd" dl380g5c \
>> node $id="3f6ec88d-ee47-4f63-bfeb-652b8dd96027" dl380g5d
>> primitive dummy ocf:pacemaker:Stateful \
>>         op start interval="0s" timeout="100s" on-fail="restart" \
>>         op monitor interval="10s" role="Master" timeout="100s" on-fail="restart" \
>>         op monitor interval="20s" role="Slave" timeout="100s" on-fail="restart" \
>>         op promote interval="0s" timeout="100s" on-fail="restart" \
>>         op demote interval="0s" timeout="100s" on-fail="block" \
>>         op stop interval="0s" timeout="100s" on-fail="block"
>> ms stateful dummy
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.0.12-066152e" \
>>         cluster-infrastructure="Heartbeat" \
>>         no-quorum-policy="ignore" \
>>         stonith-enabled="false" \
>>         startup-fencing="false" \
>>         crmd-transition-delay="2s"
>> rsc_defaults $id="rsc-options" \
>>         resource-stickiness="INFINITY" \
>>         migration-threshold="1"
>>
>>
>>
>> 1) Initial status (dl380g5c=Master/dl380g5d=Slave)
>> # crm_mon -1 -n
>>
>> ============
>> Last updated: Thu Jan 10 18:25:17 2013
>> Stack: Heartbeat
>> Current DC: dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027) - partition with quorum
>> Version: 1.0.12-066152e
>> 2 Nodes configured, unknown expected votes
>> 1 Resources configured.
>> ============
>>
>> Node dl380g5c (21c624bd-c426-43dc-9665-bbfb92054bcd): online
>>         dummy:0 (ocf::pacemaker:Stateful) Master
>> Node dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027): online
>>         dummy:1 (ocf::pacemaker:Stateful) Started
>>
>>
>>
>> 2) Modify Stateful RA to reprodece "demote NG", and put the Master node into standby mode.
>>
>> # vim /usr/lib/ocf/resource.d/pacemaker/Stateful
>> stateful_demote() {
>> return $OCF_ERR_GENERIC
>>
>>     stateful_check_state
>>     if [ $? = 0 ]; then
>>         # CRM Error - Should never happen
>>         return $OCF_NOT_RUNNING
>>
>> ...
>>
>>
>> # crm node standby dl380g5c
>> # crm_mon -1 -n
>> ============
>> Last updated: Thu Jan 10 18:27:04 2013
>> Stack: Heartbeat
>> Current DC: dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027) - partition with quorum
>> Version: 1.0.12-066152e
>> 2 Nodes configured, unknown expected votes
>> 1 Resources configured.
>> ============
>>
>> Node dl380g5c (21c624bd-c426-43dc-9665-bbfb92054bcd): standby
>>         dummy:0 (ocf::pacemaker:Stateful) Slave  (unmanaged) FAILED
>> Node dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027): online
>>         dummy:1 (ocf::pacemaker:Stateful) Master
>>
>> Failed actions:
>>     dummy:0_demote_0 (node=dl380g5c, call=4, rc=1, status=complete): unknown error
>>
>>
>> In the above crm_mon, dl380g5c's status is "Slave", but it might be still "Master" because it failed to demote.
>> So dl380g5d should be prohibited from its promoting action to prevent the multiple Master.
>> It seems that Pacemaker 1.1 shows the same behavior as 1.0.12.
>> I'm not sure but Pacemaker 1.0.11's behavior is correct(dl380g5d can not promote).
>> Please see the attached hb_report.
>>
>>
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: determine_online_status: Node dl380g5c is standby
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: determine_online_status: Node dl380g5d is online
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: unpack_rsc_op: Operation dummy:0_monitor_0 found resource dummy:0 active in master mode on dl380g5c
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: unpack_rsc_op: Processing failed op dummy:0_demote_0 on dl380g5c: unknown error (1)
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: unpack_rsc_op: Forcing dummy:0 to stop after a failed demote action
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: native_add_running: resource dummy:0 isnt managed
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: clone_print:  Master/Slave Set: stateful
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: native_print:      dummy:0  (ocf::pacemaker:Stateful):      Slave dl380g5c (unmanaged) FAILED
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: short_print:      Slaves: [ dl380g5d ]
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: get_failcount: stateful has failed 1 times on dl380g5c
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: common_apply_stickiness: Forcing stateful away from dl380g5c after 1 failures (max=1)
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: get_failcount: stateful has failed 1 times on dl380g5c
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: common_apply_stickiness: Forcing stateful away from dl380g5c after 1 failures (max=1)
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: native_color: Unmanaged resource dummy:0 allocated to 'nowhere': failed
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: master_color: Promoting dummy:1 (Slave dl380g5d)
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: info: master_color: stateful: Promoted 1 instances of a possible 1 to master
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: RecurringOp:  Start recurring monitor (10s) for dummy:1 on dl380g5d
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: RecurringOp:  Start recurring monitor (10s) for dummy:1 on dl380g5d
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: LogActions: Leave   resource dummy:0        (Slave unmanaged)
>> Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: LogActions: Promote dummy:1 (Slave -> Master dl380g5d)
>>
>>
>>
>> Best Regards,
>> Junko IKEDA
>>
>> NTT DATA INTELLILINK CORPORATION
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list