[Pacemaker] How to delete two offline nodes together?

Fanghao Sha shafanghao at gmail.com
Sat Mar 31 12:01:54 EDT 2012


Hi,****

** **

I have a cluster 4 nodes (CentOS 5.2) using pacemaker-1.0.11, with heartbeat
-3.0.3.****

The configuration is:

[root at node-0 ~]# crm configure show

node $id="25b34bc9-06d0-491c-b019-76b7acdfe30f" node-1

node $id="578988ce-5e15-4931-a659-e174fc015785" node-0

node $id="8a5a9f5c-43d1-4752-921f-4f2eebf16b64" node-3

node $id="fd2256ce-027f-4545-b28a-6b73a077e1d2" node-2

primitive failover-ip ocf:heartbeat:IPaddr2 \

params ip="10.10.5.192" \

op monitor interval="10s"

primitive master-app-rsc lsb:cluster-master \

op monitor interval="10s"

primitive node-app-rsc lsb:cluster-node \

op monitor interval="10s"

group group-dc failover-ip master-app-rsc

clone clone-node-app-rsc node-app-rsc

location rule-group-dc group-dc \

rule $id="rule-group-dc-rule" -inf: #is_dc eq false

property $id="cib-bootstrap-options" \

start-failure-is-fatal="false" \

no-quorum-policy="ignore" \

symmetric-cluster="true" \

stonith-enabled="false" \

dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \

cluster-infrastructure="Heartbeat"



The problem:

The "node-2" and "node-3" are shutdown, and their status change to offline.

At first, I tried to delete them one by one.

When running "/usr/share/heartbeat/hb_delnode node-3" on node-0, the
/var/log/messages print:


--------------------------------------------

139558 Mar 31 23:43:59 node-0 heartbeat: [13142]: ERROR: HBDoMsg_T_DELNODE:
deletion failed. We don't have all required nodes alive (node-2 is dead)

-------------------------------------------


So I think them should be deleted together.

Then running "/usr/share/heartbeat/hb_delnode node-2 node-3" on node-0, but
the /var/log/messages print:


--------------------------------------

Apr  1 00:00:21 node-0 ccm: [13194]: ERROR: ccm_control_process: Node count
from node node-0 does not agree: local count=2, count in message=3

Apr  1 00:00:21 node-0 ccm: [13194]: ERROR: Please make sure ha.cf files on
all nodes have same nodes list or add "autojoin any" to ha.cf

Apr  1 00:00:21 node-0 ccm: [13194]: info: If this problem persists, check
the heartbeat 'hostcache' files in the cluster to look for problems.

--------------------------------------


These two ways are both failed. :(

How could I do, please?

Any help is appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120401/c566188d/attachment-0002.html>


More information about the Pacemaker mailing list