[Pacemaker] Stopping corosync on the slave in a 2 node cluster causes all the resources on the master to stop. Please Help.

Wed Oct 12 08:02:18 EDT 2011

Hi Guys,

I was hoping that someone might be able to do a quick review of my cluster
config, shown below, to work out why when I shutdown corosync on the master
all the resources failover to the slave without a problem, but if I shutdown
corosync on the slave all of the resources on the master stop as well,
leaving me with both nodes broken.  Obviously what I want to happen is to be
able to shutdown corosync on the slave and all of the resources running on
the master to remain untouched. I must have something not quite right in the
logic of my cluster config.

It is a two server cluster running DRBD in active/passive mode.  The servers
are running Redhat 5.7 with corosync-1.2.7-1.1.el5 and
pacemaker-1.0.11-1.2.el5:

root at mq102:~# crm configure show
node mq101.back.live.telhc.local
node mq102.back.live.telhc.local
primitive activemq_drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="15s" timeout="20s" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100"
primitive activemq-emp lsb:activemq-emp \
op monitor interval="30s" timeout="30s" \
op stop interval="0" timeout="60s" \
op start interval="0" timeout="60s" \
meta target-role="Started"
primitive cluster_IP ocf:heartbeat:IPaddr2 \
params ip="172.23.68.61" nic="eth0" \
op monitor interval="30s" timeout="90" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
primitive drbd_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/drbd" fstype="ext3" \
op monitor interval="15s" timeout="40s" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive ping_gateway ocf:pacemaker:ping \
params name="ping_gateway" host_list="172.23.68.1" multiplier="100" \
op monitor interval="15s" timeout="20s" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100"
ms ActiveMQ_Data activemq_drbd \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true" target-role="Master"
clone ping_gateway_clone ping_gateway
location ActiveMQ_Data_on_connected_node_only ActiveMQ_Data \
rule $id="ActiveMQ_Data_on_connected_node_only-rule" -inf: not_defined
ping_gateway or ping_gateway lte 0
location ActiveMQ_Data_prefer_mq101 ActiveMQ_Data \
rule $id="ActiveMQ_Data_prefer_mq101-rule" $role="Master" 500: #uname eq
mq101.back.live.telhc.local
colocation activemq-emp_with_ActiveMQ_Data inf: activemq-emp
ActiveMQ_Data:Master
colocation cluster_IP_with_ActiveMQ_Data inf: cluster_IP
ActiveMQ_Data:Master
colocation drbd_fs_with_ActiveMQ_Data inf: drbd_fs ActiveMQ_Data:Master
order ActiveMQ_Data_after_ping_gateway_clone inf: ping_gateway_clone:start
ActiveMQ_Data:promote
order activemq-emp_after_drbd_fs inf: drbd_fs:start activemq-emp:start
order cluster_IP_after_drbd_fs inf: drbd_fs:start cluster_IP:start
order drbd_fs_after_ActiveMQ_Data inf: ActiveMQ_Data:promote drbd_fs:start
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1317808706"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

Cheers,
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111012/e9f1da4b/attachment-0002.html>