[Pacemaker] Help with N+1 configuration
Phil Frost
phil at macprofessionals.com
Thu Jul 26 13:45:00 EDT 2012
On 07/26/2012 12:34 PM, Cal Heldenbrand wrote:
> Hi everybody,
>
> I've read through the Clusters from Scratch document, but it doesn't
> seem to help me very well with an N+1 (shared hot spare) style cluster
> setup.
>
> My test case, is I have 3 memcache servers. Two are in primary use
> (hashed 50/50 by the clients) and one is a hot failover.
It sounds like you want to do this:
1) run memcache on each node
I'd use a clone to run memcache, instead of having three memcache
primitives as you had done. Something like this:
primitive memcache ...
clone memcache_clone memcache ordered=False
There are many parameters a clone can take, but this is a good start,
assuming you just want to run memcache on each node, and they can be
started in any order. You don't need to specify any location constraints
to say where memcache can run, or to keep the memcache instances from
running multiple times on one node. The clone handles all of that.
2) have ip1 on a node with a working memcache
primitive ip1 ...
colocation ip1_on_memcache inf: ip1 memcache_clone
3) have ip2 active on a different node with a working memcache
primitive ip2 ...
colocation ip2_on_memcache inf: ip2 memcache_clone
colocation ip2_not_on_ip1 -10000: ip2 ip1
I've chosen a score of -10000 for ip2_not_on_ip1 because I assume you
could, if you had no other choice, run both IPs on one node. If you'd
rather run just one IP if there is only one working memcache, you can
make this -inf, and you can set the priority attribute on the ip
primitives to determine which one is sacrificed.
You could also use a clone for the ip addresses, but since there are
only 2, simply having two primitives may be easier to understand. If you
added a third active node, you'd require four colocation constraints
((n-1)^2, in the general case) to keep all the IPs running on different
nodes. Your configuration would get very hairy, and you'd want to use a
clone.
4) you have some preferences about which servers are active in a
non-failure situation
location ip1_on_mem1 ip1 mem1: 100
location ip2_on_mem2 ip2 mem2: 100
5) (guessing you want this, most people do) if resources have migrated
due to a failure, you'd prefer to leave them where they are, rather than
move them again as soon as the failed node recovers. This way you can
migrate them when the service interruption is convenient.
primitive ... meta resource-stickiness=500
or
rsc_defaults resource-stickiness=500
I prefer to set stickiness on specific primitives I want to be sticky,
in this case, the IP addresses seem appropriate. Setting a default
stickiness is a common suggestion, but I always find it hard to know how
sticky things will be, since if there are colocation constraints,
groups, etc, the stickinesses of other resources combine in
deterministic and well defined, but complex and difficult to predict ways.
Your stickiness score must be greater than your location score (from #4)
to have any effect.
crm_simulate is very handy for examining the scores used in placing
resources. Start with "crm_simulate -LSs". You can also use the -u and
-d options to simulate nodes coming online or offline. There are many
more options -- definitely check it out. Documentation is scant
(--help), but usage is fairly obvious after playing with it a bit.
Also, some advanced techniques allow the stickness score to be based on
the time of day, so you can allow resources to move automatically back
to their preferred nodes, but only at planned times. More information:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-expression-iso8601.html
More information about the Pacemaker
mailing list