[ClusterLabs] Help required for N+1 redundancy setup

Tue Dec 1 23:23:43 EST 2015

Thank You Ken for such a detailed response. Truly appreciate it. Cheers.

On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 12/01/2015 05:31 AM, Nikhil Utane wrote:
> > Hi,
> >
> > I am evaluating whether it is feasible to use Pacemaker + Corosync to add
> > support for clustering/redundancy into our product.
>
> Most definitely
>
> > Our objectives:
> > 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
>
> You can do this with location constraints and scores. See:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on
>
> Basically, you give the standby node a lower score than the other nodes.
>
> > 2) Each node has some different configuration parameters.
> > 3) Whenever any active node goes down, the standby node comes up with the
> > same configuration that the active had.
>
> How you solve this requirement depends on the specifics of your
> situation. Ideally, you can use OCF resource agents that take the
> configuration location as a parameter. You may have to write your own,
> if none is available for your services.
>
> > 4) There is no one single process/service for which we need redundancy,
> > rather it is the entire system (multiple processes running together).
>
> This is trivially implemented using either groups or ordering and
> colocation constraints.
>
> Order constraint = start service A before starting service B (and stop
> in reverse order)
>
> Colocation constraint = keep services A and B on the same node
>
> Group = shortcut to specify several services that need to start/stop in
> order and be kept together
>
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392
>
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources
>
>
> > 5) I would also want to be notified when any active<->standby state
> > transition happens as I would want to take some steps at the application
> > level.
>
> There are multiple approaches.
>
> If you don't mind compiling your own packages, the latest master branch
> (which will be part of the upcoming 1.1.14 release) has built-in
> notification capability. See:
> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
>
> Otherwise, you can use SNMP or e-mail if your packages were compiled
> with those options, or you can use the ocf:pacemaker:ClusterMon resource
> agent:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928
>
> > I went through the documents/blogs but all had example for 1 active and 1
> > standby use-case and that too for some standard service like httpd.
>
> Pacemaker is incredibly versatile, and the use cases are far too varied
> to cover more than a small subset. Those simple examples show the basic
> building blocks, and can usually point you to the specific features you
> need to investigate further.
>
> > One additional question, If I am having multiple actives, then Virtual IP
> > configuration cannot be used? Is it possible such that N actives have
> > different IP addresses but whenever standby becomes active it uses the IP
> > address of the failed node?
>
> Yes, there are a few approaches here, too.
>
> The simplest is to assign a virtual IP to each active, and include it in
> your group of resources. The whole group will fail over to the standby
> node if the original goes down.
>
> If you want a single virtual IP that is used by all your actives, one
> alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned,
> that resource agent will use iptables' CLUSTERIP functionality, which
> relies on multicast Ethernet addresses (not to be confused with
> multicast IP). Since multicast Ethernet has limitations, this is not
> often used in production.
>
> A more complicated method is to use a virtual IP in combination with a
> load-balancer such as haproxy. Pacemaker can manage haproxy and the real
> services, and haproxy manages distributing requests to the real services.
>
> > Thanking in advance.
> > Nikhil
>
> A last word of advice: Fencing (aka STONITH) is important for proper
> recovery from difficult failure conditions. Without it, it is possible
> to have data loss or corruption in a split-brain situation.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20151202/ebdfa937/attachment-0003.html>