[ClusterLabs] simple active/active router using pacemaker+corosync

Thu Jan 26 20:31:23 UTC 2017

On Thu, Jan 26, 2017 at 12:10:24PM +0100, Arturo Borrero Gonzalez wrote:
> I have a rather simple 2 nodes active/active router using pacemaker+corosync.
> 
> Why active-active? Well, one node holds the virtual IPv4 resources and
> the other node holds the virtual IPv6 resources.
> On failover, both nodes are able to run all the virtual IPv4/IPv6 addresses.
> 
> We have about 30 resources configured, and more will be added in the future.

You may need to check some pacemaker limits for this number of resources:

* batch-limit (30)
The number of jobs that the Transition Engine (TE) is allowed
to execute in parallel. The TE is the logic in pacemaker’s CRMd that executes
the actions determined by the Policy Engine (PE). The "correct" value will
depend on the speed and load of your network and cluster nodes.

* migration-limit (-1)
The number of migration jobs that the TE is allowed to
execute in parallel on a node. A value of -1 means unlimited. 

> The problems/questions are:
> 
>  * The IPv6addr resource agent is so slow. I guess that's because of
> the additional checks (pings). I had to switch to IPaddr2 for the
> virtual IPv6 resources as well, which improves the failover times a
> bit. Is this expected? Any hint here?

Can you check how slow it is?  It should take 5 seconds to send
advertisments so the whole move takes 6-7 seconds which seems resonable
to me.  The address should be functional most of that time.

>  * In order to ease management, I created 2 groups, one for all the
> IPv4 addresses and other for all the IPv6 addresses. This way, I can
> perform operations (such as movements, start/stop) for all the
> resources in one go. This has a known drawback: in a group, the
> resources are managed in chain by the order of the group. On failover,
> this really hurts the movement time, since resources aren't moved in
> parallel but sequentially. Any hint here?
> 
> I would like to have a simple way of managing lot of resources in one
> go, but without the ordering drawbacks of a group.

Guess you could create a Dummy resource and make INIFINITY colloction
constraints for the IPs so they follow Dummy as it moves between the
nodes :)

-- 
Valentin