[Pacemaker] primary role not being taken by secondary on hard shutdown of the original primary

Shravan Mishra shravan.mishra at gmail.com
Thu Jun 9 15:47:17 EDT 2011


Hi guys,

I'm facing weird problem, I'm not sure if anyone else has seen this.

Basically I have a pair and when I do a hard shutdown like
---"ipmitool chassis power off" --- of primary then secondary is just
sitting as it is and drbd is not becoming master on that box.

Some info about my environment:
Pacemaker version -- Version: 1.1.2
DRBD - Version: 8.3.8.1
Corosync -- 1.2.8

When I look at the status section I see that despite the hard shutdown
node node status is not updated properly by lrmd:

I'm just mentioning the node state and transient attributes  pieces of
the status part of cibconfig:


<node_state uname="C725.elab.itactics.com" ha="active" in_ccm="true"
crmd="online" expected="member" shutdown="0" join="member"
id="C725.elab.itactics.com" crm-debug-origin="do_update_resource">

 <transient_attributes id="C725.elab.itactics.com">
        <instance_attributes id="status-C725.elab.itactics.com">
          <nvpair id="status-C725.elab.itactics.com-probe_complete"
name="probe_complete" value="true"/>
          <nvpair name="master-drbd0:0"
id="status-C725.elab.itactics.com-master-drbd0:0" value="10000"/>
          <nvpair id="status-C725.elab.itactics.com-pingd"
name="pingd" value="1000"/>
        </instance_attributes>
      </transient_attributes>


<node_state uname="C726.elab.itactics.com" crmd="online" ha="active"
in_ccm="false" join="pending" expected="member" shutdown="0"
id="C726.elab.itactics.com" crm-debug-origin="do_state_transition">


 <transient_attributes id="C726.elab.itactics.com">
        <instance_attributes id="status-C726.elab.itactics.com">
          <nvpair id="status-C726.elab.itactics.com-probe_complete"
name="probe_complete" value="true"/>
          <nvpair name="master-drbd0:1"
id="status-C726.elab.itactics.com-master-drbd0:1" value="10000"/>
          <nvpair id="status-C726.elab.itactics.com-pingd"
name="pingd" value="1000"/>
        </instance_attributes>
      </transient_attributes>

crm_mon
============
Last updated: Thu Jun  9 11:40:52 2011
Stack: openais
Current DC: C725.elab.itactics.com - partition WITHOUT quorum
Version: 1.1.2-e0d731c2b1be446b27a73327a53067bf6230fb6a
2 Nodes configured, 2 expected votes
7 Resources configured.
============

Node C726.elab.itactics.com: UNCLEAN (offline)
Online: [ C725.elab.itactics.com ]

 Clone Set: connectivity [ping]
     Started: [ C725.elab.itactics.com ]
     Stopped: [ ping:0 ]
 Master/Slave Set: ms-drbd [drbd0]
     Slaves: [ C725.elab.itactics.com ]
     Stopped: [ drbd0:0 ]
 C726.elab.itactics.com-stonith	(stonith:external/safe/ipmi):	Started
C725.elab.itactics.com



So for C726 even though in_ccm=false but rest of it is just like as if
it is online.
Why crmd has not been able to update this information properly ?

I'm assuming it is because of this situation secondary remains slave
and never gets promoted to master. because in transient attributes
section there is nothing preventing it to become master.


Thanks
Shravan




More information about the Pacemaker mailing list