<div dir="ltr"><div>Hi,</div><div><br></div><div>I'm running Stateful RA with Pacemaker 1.0.12, and found that its demote behavior is something wrong.</div><div><br></div><div>This is my configuration;</div><div>There is no stonith devices, and demote/stop are set as on-fail="block".</div>
<div><br></div><div># crm configure show</div><div>node $id="21c624bd-c426-43dc-9665-bbfb92054bcd" dl380g5c \</div><div>node $id="3f6ec88d-ee47-4f63-bfeb-652b8dd96027" dl380g5d</div><div>primitive dummy ocf:pacemaker:Stateful \</div>
<div> op start interval="0s" timeout="100s" on-fail="restart" \</div><div> op monitor interval="10s" role="Master" timeout="100s" on-fail="restart" \</div>
<div> op monitor interval="20s" role="Slave" timeout="100s" on-fail="restart" \</div><div> op promote interval="0s" timeout="100s" on-fail="restart" \</div>
<div> op demote interval="0s" timeout="100s" on-fail="block" \</div><div> op stop interval="0s" timeout="100s" on-fail="block"</div><div>ms stateful dummy</div>
<div>property $id="cib-bootstrap-options" \</div><div> dc-version="1.0.12-066152e" \</div><div> cluster-infrastructure="Heartbeat" \</div><div> no-quorum-policy="ignore" \</div>
<div> stonith-enabled="false" \</div><div> startup-fencing="false" \</div><div> crmd-transition-delay="2s"</div><div>rsc_defaults $id="rsc-options" \</div><div>
resource-stickiness="INFINITY" \</div><div> migration-threshold="1"</div><div><br></div><div><br></div><div><br></div><div>1) Initial status (dl380g5c=Master/dl380g5d=Slave)</div><div># crm_mon -1 -n</div>
<div><br></div><div>============</div><div>Last updated: Thu Jan 10 18:25:17 2013</div><div>Stack: Heartbeat</div><div>Current DC: dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027) - partition with quorum</div><div>Version: 1.0.12-066152e</div>
<div>2 Nodes configured, unknown expected votes</div><div>1 Resources configured.</div><div>============</div><div><br></div><div>Node dl380g5c (21c624bd-c426-43dc-9665-bbfb92054bcd): online</div><div> dummy:0 (ocf::pacemaker:Stateful) Master</div>
<div>Node dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027): online</div><div> dummy:1 (ocf::pacemaker:Stateful) Started</div><div><br></div><div><br></div><div><br></div><div>2) Modify Stateful RA to reprodece "demote NG", and put the Master node into standby mode.</div>
<div><br></div><div># vim /usr/lib/ocf/resource.d/pacemaker/Stateful</div><div>stateful_demote() {</div><div>return $OCF_ERR_GENERIC</div><div><br></div><div> stateful_check_state</div><div> if [ $? = 0 ]; then</div>
<div> # CRM Error - Should never happen</div><div> return $OCF_NOT_RUNNING</div><div><br></div><div>...</div><div><br></div><div><br></div><div># crm node standby dl380g5c</div><div># crm_mon -1 -n</div><div>
============</div><div>Last updated: Thu Jan 10 18:27:04 2013</div><div>Stack: Heartbeat</div><div>Current DC: dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027) - partition with quorum</div><div>Version: 1.0.12-066152e</div>
<div>2 Nodes configured, unknown expected votes</div><div>1 Resources configured.</div><div>============</div><div><br></div><div>Node dl380g5c (21c624bd-c426-43dc-9665-bbfb92054bcd): standby</div><div> dummy:0 (ocf::pacemaker:Stateful) Slave (unmanaged) FAILED</div>
<div>Node dl380g5d (3f6ec88d-ee47-4f63-bfeb-652b8dd96027): online</div><div> dummy:1 (ocf::pacemaker:Stateful) Master</div><div><br></div><div>Failed actions:</div><div> dummy:0_demote_0 (node=dl380g5c, call=4, rc=1, status=complete): unknown error</div>
<div><br></div><div><br></div><div>In the above crm_mon, dl380g5c's status is "Slave", but it might be still "Master" because it failed to demote.</div><div>So dl380g5d should be prohibited from its promoting action to prevent the multiple Master.</div>
<div>It seems that Pacemaker 1.1 shows the same behavior as 1.0.12.</div><div>I'm not sure but Pacemaker 1.0.11's behavior is correct(dl380g5d can not promote).</div><div>Please see the attached hb_report.</div><div>
<br></div><div><br></div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: determine_online_status: Node dl380g5c is standby</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: determine_online_status: Node dl380g5d is online</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: unpack_rsc_op: Operation dummy:0_monitor_0 found resource dummy:0 active in master mode on dl380g5c</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: unpack_rsc_op: Processing failed op dummy:0_demote_0 on dl380g5c: unknown error (1)</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: unpack_rsc_op: Forcing dummy:0 to stop after a failed demote action</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: native_add_running: resource dummy:0 isnt managed</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: clone_print: Master/Slave Set: stateful</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: native_print: dummy:0<span class="" style="white-space:pre"> </span>(ocf::pacemaker:Stateful):<span class="" style="white-space:pre"> </span>Slave dl380g5c (unmanaged) FAILED</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: short_print: Slaves: [ dl380g5d ]</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: get_failcount: stateful has failed 1 times on dl380g5c</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: common_apply_stickiness: Forcing stateful away from dl380g5c after 1 failures (max=1)</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: get_failcount: stateful has failed 1 times on dl380g5c</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: WARN: common_apply_stickiness: Forcing stateful away from dl380g5c after 1 failures (max=1)</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: native_color: Unmanaged resource dummy:0 allocated to 'nowhere': failed</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: master_color: Promoting dummy:1 (Slave dl380g5d)</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: info: master_color: stateful: Promoted 1 instances of a possible 1 to master</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: RecurringOp: Start recurring monitor (10s) for dummy:1 on dl380g5d</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: RecurringOp: Start recurring monitor (10s) for dummy:1 on dl380g5d</div><div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: LogActions: Leave resource dummy:0<span class="" style="white-space:pre"> </span>(Slave unmanaged)</div>
<div>Jan 10 18:27:01 dl380g5d pengine: [4297]: notice: LogActions: Promote dummy:1<span class="" style="white-space:pre"> </span>(Slave -> Master dl380g5d)</div><div><br></div><div><br></div><div><br></div><div>Best Regards,</div>
<div>Junko IKEDA</div><div><br></div><div>NTT DATA INTELLILINK CORPORATION</div></div>