[Pacemaker] Pacemaker restart resources when node joins cluster after failback

Andreas Kurz andreas at hastexo.com
Tue Mar 6 03:29:15 EST 2012


Hello,

On 03/05/2012 08:58 PM, José Alonso wrote:
> Hi all,
> 
> I have 2 Debian nodes with heartbeat and pacemaker 1.1.6 installed, and
> almost everything is working fine, I have only apache configured for
> testing, when a node goes down the failover is done correctly, but
> there's a problem when a node failbacks.
> 
> For example, let's say that Node1 has the lead on apache resource, then
> I reboot Node1, so Pacemaker detect it goes down, then apache is
> promoted to the Node2 and it keeps there running fine, that's fine, but
> when Node1 recovers and joins the cluster again, apache is restarted on
> Node2 again.
> 
> Anyone knows, why resources are restarted when a node rejoins a cluster ?
> 
> This is my pacemaker configuration:
> 
> node $id="2ac5f37d-cd54-4932-92dc-418b4fd0e6e6" nodo2 \
> attributes standby="off"
> node $id="938594ef-839a-40bb-aa5e-5715622693b3" nodo1 \
> attributes standby="off"
> primitive apache2 lsb:apache2 \
> meta migration-threshold="1" failure-timeout="2" \
> op monitor interval="5s" resource-stickiness="INFINITY"
> primitive ip1 ocf:heartbeat:IPaddr2 \
> params ip="192.168.1.38" nic="eth0:0"
> primitive ip1arp ocf:heartbeat:SendArp \
> params ip="192.168.1.38" nic="eth0:0"
> group WebServices ip1 ip1arp apache2
> location cli-prefer-WebServices WebServices \
> rule $id="cli-prefer-rule-WebServices" inf: #uname eq nodo2

remove that migration constraint ("cli-prefer-....") and try again ...
best practice is to remove such a constraint immediately after the
resource migration is completed.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now



> colocation ip_with_arp inf: ip1 ip1arp
> colocation web_with_ip inf: apache2 ip1
> order arp_after_ip inf: ip1:start ip1arp:start
> order web_after_ip inf: ip1arp:start apache2:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> cluster-infrastructure="Heartbeat" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="INFINITY"
> 
> 
> This is what I see on crm_mon:
> 
> 1-. Node1 and Node1 OK:
> 
> Online: [ node1 node2 ]
> 
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node1
> ip1arp (ocf::heartbeat:SendArp): Started node1
> apache2 (lsb:apache2): Started node1
> 
> 
> 2-. I reboot Node1 so Pacemaker promotes resources to Node2:
> 
> Online: [ node2 ]
> OFFLINE: [node1]
> 
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Started node2
> apache2 (lsb:apache2): Started node2
> 
> 
> 3-. Node1 is online again and join the cluster, resources still on Node2:
> 
> Online: [ node1 node2 ]
> 
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Started node2
> apache2 (lsb:apache2): Started node2
> 
> 4-. But after some seconds, resources are stopped on Node2 and restarted
> again on the same Node2:
> 
> Online: [ node1 node2 ]
> 
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Stopped
> apache2 (lsb:apache2): Stopped
> 
> 
> 5-. Resources restarted and still on Node2
> 
> Online: [ node1 node2 ]
> 
> Resource Group: WebServices
> ip1 (ocf::heartbeat:IPaddr2): Started node2
> ip1arp (ocf::heartbeat:SendArp): Started node2
> apache2 (lsb:apache2): Started node2
> 
> 
> 
> Why resources were restarted on Node2 if they where running fine?
> 
> 
> This body part will be downloaded on demand.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120306/f2ee3570/attachment-0003.sig>


More information about the Pacemaker mailing list