[Pacemaker] Pacemaker restart resources when node joins cluster after failback

Mon Mar 5 14:58:16 EST 2012

Hi all,

I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and
almost everything is working fine, I have only apache configured for
testing, when a node goes down the failover is done correctly, but there's
a problem when a node failbacks.

For example, let's say that Node1 has the lead on apache resource, then I
reboot Node1, so Pacemaker detect it goes down, then apache is promoted to
the Node2 and it keeps there running fine, that's fine, but when Node1
recovers and joins the cluster again, apache is restarted on Node2 again.

Anyone knows, why resources are restarted when a node rejoins a cluster ?

This is my pacemaker configuration:

node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \
attributes standby="off"
node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \
attributes standby="off"
primitive apache2 lsb:apache2 \
meta migration-threshold="1" failure-timeout="2" \
op monitor interval="5s" resource-stickiness="INFINITY"
primitive ip1 ocf:heartbeat:IPaddr2 \
params ip="192.168.1.38" nic="eth0:0"
primitive ip1arp ocf:heartbeat:SendArp \
params ip="192.168.1.38" nic="eth0:0"
group WebServices ip1 ip1arp apache2
location cli-prefer-WebServices WebServices \
rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2
colocation ip_with_arp inf: ip1 ip1arp
colocation web_with_ip inf: apache2 ip1
order arp_after_ip inf: ip1:start ip1arp:start
order web_after_ip inf: ip1arp:start apache2:start
property $id="cib-bootstrap-options" \
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
cluster-infrastructure="Heartbeat" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="INFINITY"

This is what I see on crm_mon:

1-. Node1 and Node1 OK:

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node1
ip1arp (ocf::heartbeat:SendArp): Started node1
apache2 (lsb:apache2): Started node1

2-. I reboot Node1 so Pacemaker promotes resources to Node2:

Online: [ node2 ]
OFFLINE: [node1]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Started node2
apache2 (lsb:apache2): Started node2

3-. Node1 is online again and join the cluster, resources still on Node2:

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Started node2
apache2 (lsb:apache2): Started node2

4-. But after some seconds, resources are stopped on Node2 and restarted
again on the same Node2:

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Stopped
apache2 (lsb:apache2): Stopped

5-. Resources restarted and still on Node2

Online: [ node1 node2 ]

Resource Group: WebServices
ip1 (ocf::heartbeat:IPaddr2): Started node2
ip1arp (ocf::heartbeat:SendArp): Started node2
apache2 (lsb:apache2): Started node2

Why resources were restarted on Node2 if they where running fine?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120305/9097c00d/attachment-0002.html>