[Pacemaker] primary role not being taken by secondary on hard shutdown of the original primary

Shravan Mishra shravan.mishra at gmail.com
Fri Jun 10 08:11:56 EDT 2011


Thanks for the reply.

 Our corosync is using eth0 and when we do a hard shutdown the eth0
link was rebooted on the other machine i.e. the one which was supposed
to takeover as primary, because primary and secondary are crossovered.

Looks like corosync on the live box just got confused because of that
and states didn't get updated.

On our other boxes that was not happening. Since drbd is on eth2 for
us and that interface was getting rebooted so everything worked fine.

-Shravan



On Fri, Jun 10, 2011 at 3:42 AM, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> Hi,
>
> On Thu, Jun 09, 2011 at 03:47:17PM -0400, Shravan Mishra wrote:
>> Hi guys,
>>
>> I'm facing weird problem, I'm not sure if anyone else has seen this.
>>
>> Basically I have a pair and when I do a hard shutdown like
>> ---"ipmitool chassis power off" --- of primary then secondary is just
>> sitting as it is and drbd is not becoming master on that box.
>
> Do you have fencing (stonith) configured? Does it work? If you
> kill a node like that, then the surviving node has to make sure
> itself that it is really down before doing anything with
> resources. If you don't have stonith configured, well, you should
> have.
>
> Thanks,
>
> Dejan
>
>> Some info about my environment:
>> Pacemaker version -- Version: 1.1.2
>> DRBD - Version: 8.3.8.1
>> Corosync -- 1.2.8
>>
>> When I look at the status section I see that despite the hard shutdown
>> node node status is not updated properly by lrmd:
>>
>> I'm just mentioning the node state and transient attributes  pieces of
>> the status part of cibconfig:
>>
>>
>> <node_state uname="C725.elab.itactics.com" ha="active" in_ccm="true"
>> crmd="online" expected="member" shutdown="0" join="member"
>> id="C725.elab.itactics.com" crm-debug-origin="do_update_resource">
>>
>>  <transient_attributes id="C725.elab.itactics.com">
>>         <instance_attributes id="status-C725.elab.itactics.com">
>>           <nvpair id="status-C725.elab.itactics.com-probe_complete"
>> name="probe_complete" value="true"/>
>>           <nvpair name="master-drbd0:0"
>> id="status-C725.elab.itactics.com-master-drbd0:0" value="10000"/>
>>           <nvpair id="status-C725.elab.itactics.com-pingd"
>> name="pingd" value="1000"/>
>>         </instance_attributes>
>>       </transient_attributes>
>>
>>
>> <node_state uname="C726.elab.itactics.com" crmd="online" ha="active"
>> in_ccm="false" join="pending" expected="member" shutdown="0"
>> id="C726.elab.itactics.com" crm-debug-origin="do_state_transition">
>>
>>
>>  <transient_attributes id="C726.elab.itactics.com">
>>         <instance_attributes id="status-C726.elab.itactics.com">
>>           <nvpair id="status-C726.elab.itactics.com-probe_complete"
>> name="probe_complete" value="true"/>
>>           <nvpair name="master-drbd0:1"
>> id="status-C726.elab.itactics.com-master-drbd0:1" value="10000"/>
>>           <nvpair id="status-C726.elab.itactics.com-pingd"
>> name="pingd" value="1000"/>
>>         </instance_attributes>
>>       </transient_attributes>
>>
>> crm_mon
>> ============
>> Last updated: Thu Jun  9 11:40:52 2011
>> Stack: openais
>> Current DC: C725.elab.itactics.com - partition WITHOUT quorum
>> Version: 1.1.2-e0d731c2b1be446b27a73327a53067bf6230fb6a
>> 2 Nodes configured, 2 expected votes
>> 7 Resources configured.
>> ============
>>
>> Node C726.elab.itactics.com: UNCLEAN (offline)
>> Online: [ C725.elab.itactics.com ]
>>
>>  Clone Set: connectivity [ping]
>>      Started: [ C725.elab.itactics.com ]
>>      Stopped: [ ping:0 ]
>>  Master/Slave Set: ms-drbd [drbd0]
>>      Slaves: [ C725.elab.itactics.com ]
>>      Stopped: [ drbd0:0 ]
>>  C726.elab.itactics.com-stonith       (stonith:external/safe/ipmi):   Started
>> C725.elab.itactics.com
>>
>>
>>
>> So for C726 even though in_ccm=false but rest of it is just like as if
>> it is online.
>> Why crmd has not been able to update this information properly ?
>>
>> I'm assuming it is because of this situation secondary remains slave
>> and never gets promoted to master. because in transient attributes
>> section there is nothing preventing it to become master.
>>
>>
>> Thanks
>> Shravan
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list