[Pacemaker] Trouble setting up IP failover with ping resource

Anlu Wang anlu at mixpanel.com
Thu Feb 16 22:57:14 EST 2012


I have three machines named anlutest1, anlutest2, and anlutest3 that I'm
trying to get IP failover working on. I'm using heartbeat for the messaging
layer, and everything works great when a machine goes down. But I also
would like to failover an IP when EITHER the eth0 or eth1 network
interfaces fail. From reading

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html

it seems the right way to do this is to add a ping resource.

Here is my XML configuration:

http://pastebin.com/05z7eB2s

This config doesn't work for me. Using the showscores.sh script found at:

http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg00410.html

I see that my scores are:

Resource                       Score     Node      Stickiness #Fail
 Migration-Threshold
address01                      0         anlutest3 0          0

address01                      1006      anlutest1 0          5

address01                      50        anlutest2 0          157

address02                      0         anlutest3 0          0

address02                      1050      anlutest2 0          2

address02                      6         anlutest1 0          0

address03                      1000      anlutest3 0          7

address03                      50        anlutest2 0

address03                      6         anlutest1 0          0

ping:0                         0         anlutest1 0          6

ping:0                         0         anlutest2 0          14

ping:0                         0         anlutest3 0          0

ping:1                         0         anlutest2 0

ping:1                         0         anlutest3 0          28

ping:1                         -1000000  anlutest1 0          0

ping:2                         0         anlutest3 0          13

ping:2                         -1000000  anlutest1 0          0

ping:2                         -1000000  anlutest2 0

which make no sense at all. I don't see how I could be getting these scores
of 50 and 1006. When I take down an interface on anlutest3, I see scores of
4 and 1004, which sort of make sense, just the multiplier of 100 isn't
working. I was experimenting with changing values, so maybe its caching old
values. If so, how do I enforce the new values?

Furthermore, shouldn't there be no scores of 0? If all 6 IPs I am pinging
return successfully, shouldn't my scores be either 600 or 1600?

In my syslog I also see a ton of messages like

Feb 17 03:54:47 anlutest2 lrmd: [1137]: info: perform_op:2877: operations
on resource address01 already delayed
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation
monitor[419] on ocf::ping::ping:1 for client 1140, its parameters:
CRM_meta_clone=[1] host_list=[10.54.130.6 10.54.130.8 10.54.130.7
50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s]
crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] multiplier=[10000]
CRM_meta_name=[monitor] CRM_meta_timeout=[60000] CRM_meta_interval=[5000]
 for rsc is already running.
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: postponing
all ops on resource ping:1 by 1000 ms
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation
monitor[171] on ocf::ping::ping:2 for client 1140, its parameters:
CRM_meta_clone=[2] host_list=[10.54.130.6 10.54.130.8 10.54.130.7
50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s]
crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] multiplier=[1]
CRM_meta_name=[monitor] CRM_meta_timeout=[30000] CRM_meta_interval=[5000]
 for rsc is already running.
Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: postponing
all ops on resource ping:2 by 1000 ms

and occasionally

Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_trigger_update:
Sending flush op to all hosts for: pingd (4000)
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_ha_callback: flush
message from anlutest2
Feb 17 03:54:33 anlutest2 attrd: [1139]: WARN: find_nvpair_attr: Multiple
attributes match name=pingd
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr:   Value:
50 #011(id=status-d619a94e-ebba-4ed0-8e0f-89837dd7506b-pingd)
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr:   Value: 3
#011(id=status-ab3c1a25-9471-48f7-9c0b-c76238abd402-pingd)
Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_perform_update: Sent
update -40: pingd=4000
Feb 17 03:54:33 anlutest2 attrd: [1139]: ERROR: attrd_cib_callback: Update
-40 for pingd=4000 failed: Required data for this CIB API call not found

Could someone just take a look at my config and let me know what I'm doing
wrong? Or if there's a better way to do what I want to do...

Thanks in advance,
Anlu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120216/df07fd0d/attachment-0002.html>


More information about the Pacemaker mailing list