[Pacemaker] primary role not being taken by secondary on hard shutdown of the original primary

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Jun 10 03:42:55 EDT 2011


Hi,

On Thu, Jun 09, 2011 at 03:47:17PM -0400, Shravan Mishra wrote:
> Hi guys,
> 
> I'm facing weird problem, I'm not sure if anyone else has seen this.
> 
> Basically I have a pair and when I do a hard shutdown like
> ---"ipmitool chassis power off" --- of primary then secondary is just
> sitting as it is and drbd is not becoming master on that box.

Do you have fencing (stonith) configured? Does it work? If you
kill a node like that, then the surviving node has to make sure
itself that it is really down before doing anything with
resources. If you don't have stonith configured, well, you should
have.

Thanks,

Dejan

> Some info about my environment:
> Pacemaker version -- Version: 1.1.2
> DRBD - Version: 8.3.8.1
> Corosync -- 1.2.8
> 
> When I look at the status section I see that despite the hard shutdown
> node node status is not updated properly by lrmd:
> 
> I'm just mentioning the node state and transient attributes  pieces of
> the status part of cibconfig:
> 
> 
> <node_state uname="C725.elab.itactics.com" ha="active" in_ccm="true"
> crmd="online" expected="member" shutdown="0" join="member"
> id="C725.elab.itactics.com" crm-debug-origin="do_update_resource">
> 
>  <transient_attributes id="C725.elab.itactics.com">
>         <instance_attributes id="status-C725.elab.itactics.com">
>           <nvpair id="status-C725.elab.itactics.com-probe_complete"
> name="probe_complete" value="true"/>
>           <nvpair name="master-drbd0:0"
> id="status-C725.elab.itactics.com-master-drbd0:0" value="10000"/>
>           <nvpair id="status-C725.elab.itactics.com-pingd"
> name="pingd" value="1000"/>
>         </instance_attributes>
>       </transient_attributes>
> 
> 
> <node_state uname="C726.elab.itactics.com" crmd="online" ha="active"
> in_ccm="false" join="pending" expected="member" shutdown="0"
> id="C726.elab.itactics.com" crm-debug-origin="do_state_transition">
> 
> 
>  <transient_attributes id="C726.elab.itactics.com">
>         <instance_attributes id="status-C726.elab.itactics.com">
>           <nvpair id="status-C726.elab.itactics.com-probe_complete"
> name="probe_complete" value="true"/>
>           <nvpair name="master-drbd0:1"
> id="status-C726.elab.itactics.com-master-drbd0:1" value="10000"/>
>           <nvpair id="status-C726.elab.itactics.com-pingd"
> name="pingd" value="1000"/>
>         </instance_attributes>
>       </transient_attributes>
> 
> crm_mon
> ============
> Last updated: Thu Jun  9 11:40:52 2011
> Stack: openais
> Current DC: C725.elab.itactics.com - partition WITHOUT quorum
> Version: 1.1.2-e0d731c2b1be446b27a73327a53067bf6230fb6a
> 2 Nodes configured, 2 expected votes
> 7 Resources configured.
> ============
> 
> Node C726.elab.itactics.com: UNCLEAN (offline)
> Online: [ C725.elab.itactics.com ]
> 
>  Clone Set: connectivity [ping]
>      Started: [ C725.elab.itactics.com ]
>      Stopped: [ ping:0 ]
>  Master/Slave Set: ms-drbd [drbd0]
>      Slaves: [ C725.elab.itactics.com ]
>      Stopped: [ drbd0:0 ]
>  C726.elab.itactics.com-stonith	(stonith:external/safe/ipmi):	Started
> C725.elab.itactics.com
> 
> 
> 
> So for C726 even though in_ccm=false but rest of it is just like as if
> it is online.
> Why crmd has not been able to update this information properly ?
> 
> I'm assuming it is because of this situation secondary remains slave
> and never gets promoted to master. because in transient attributes
> section there is nothing preventing it to become master.
> 
> 
> Thanks
> Shravan
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




More information about the Pacemaker mailing list