[ClusterLabs] Help required for N+1 redundancy setup

Tue Dec 22 01:17:22 EST 2015

I have prepared a write-up explaining my requirements and current solution
that I am proposing based on my understanding so far.
Kindly let me know if what I am proposing is good or there is a better way
to achieve the same.

https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing

Let me know if you face any issue in accessing the above link. Thanks.

On Thu, Dec 3, 2015 at 11:34 PM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 12/03/2015 05:23 AM, Nikhil Utane wrote:
> > Ken,
> >
> > One more question, if i have to propagate configuration changes between
> the
> > nodes then is cpg (closed process group) the right way?
> > For e.g.
> > Active Node1 has config A=1, B=2
> > Active Node2 has config A=3, B=4
> > Standby Node needs to have configuration for all the nodes such that
> > whichever goes down, it comes up with those values.
> > Here configuration is not static but can be updated at run-time.
>
> Being unfamiliar with the specifics of your case, I can't say what the
> best approach is, but it sounds like you will need to write a custom OCF
> resource agent to manage your service.
>
> A resource agent is similar to an init script:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf
>
> The RA will start the service with the appropriate configuration. It can
> use per-resource options configured in pacemaker or external information
> to do that.
>
> How does your service get its configuration currently?
>
> > BTW, I'm little confused between OpenAIS and Corosync. For my purpose I
> > should be able to use either, right?
>
> Corosync started out as a subset of OpenAIS, optimized for use with
> Pacemaker. Corosync 2 is now the preferred membership layer for
> Pacemaker for most uses, though other layers are still supported.
>
> > Thanks.
> >
> > On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> >
> >> On 12/01/2015 05:31 AM, Nikhil Utane wrote:
> >>> Hi,
> >>>
> >>> I am evaluating whether it is feasible to use Pacemaker + Corosync to
> add
> >>> support for clustering/redundancy into our product.
> >>
> >> Most definitely
> >>
> >>> Our objectives:
> >>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
> >>
> >> You can do this with location constraints and scores. See:
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on
> >>
> >> Basically, you give the standby node a lower score than the other nodes.
> >>
> >>> 2) Each node has some different configuration parameters.
> >>> 3) Whenever any active node goes down, the standby node comes up with
> the
> >>> same configuration that the active had.
> >>
> >> How you solve this requirement depends on the specifics of your
> >> situation. Ideally, you can use OCF resource agents that take the
> >> configuration location as a parameter. You may have to write your own,
> >> if none is available for your services.
> >>
> >>> 4) There is no one single process/service for which we need redundancy,
> >>> rather it is the entire system (multiple processes running together).
> >>
> >> This is trivially implemented using either groups or ordering and
> >> colocation constraints.
> >>
> >> Order constraint = start service A before starting service B (and stop
> >> in reverse order)
> >>
> >> Colocation constraint = keep services A and B on the same node
> >>
> >> Group = shortcut to specify several services that need to start/stop in
> >> order and be kept together
> >>
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392
> >>
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources
> >>
> >>
> >>> 5) I would also want to be notified when any active<->standby state
> >>> transition happens as I would want to take some steps at the
> application
> >>> level.
> >>
> >> There are multiple approaches.
> >>
> >> If you don't mind compiling your own packages, the latest master branch
> >> (which will be part of the upcoming 1.1.14 release) has built-in
> >> notification capability. See:
> >> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
> >>
> >> Otherwise, you can use SNMP or e-mail if your packages were compiled
> >> with those options, or you can use the ocf:pacemaker:ClusterMon resource
> >> agent:
> >>
> >>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928
> >>
> >>> I went through the documents/blogs but all had example for 1 active
> and 1
> >>> standby use-case and that too for some standard service like httpd.
> >>
> >> Pacemaker is incredibly versatile, and the use cases are far too varied
> >> to cover more than a small subset. Those simple examples show the basic
> >> building blocks, and can usually point you to the specific features you
> >> need to investigate further.
> >>
> >>> One additional question, If I am having multiple actives, then Virtual
> IP
> >>> configuration cannot be used? Is it possible such that N actives have
> >>> different IP addresses but whenever standby becomes active it uses the
> IP
> >>> address of the failed node?
> >>
> >> Yes, there are a few approaches here, too.
> >>
> >> The simplest is to assign a virtual IP to each active, and include it in
> >> your group of resources. The whole group will fail over to the standby
> >> node if the original goes down.
> >>
> >> If you want a single virtual IP that is used by all your actives, one
> >> alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned,
> >> that resource agent will use iptables' CLUSTERIP functionality, which
> >> relies on multicast Ethernet addresses (not to be confused with
> >> multicast IP). Since multicast Ethernet has limitations, this is not
> >> often used in production.
> >>
> >> A more complicated method is to use a virtual IP in combination with a
> >> load-balancer such as haproxy. Pacemaker can manage haproxy and the real
> >> services, and haproxy manages distributing requests to the real
> services.
> >>
> >>> Thanking in advance.
> >>> Nikhil
> >>
> >> A last word of advice: Fencing (aka STONITH) is important for proper
> >> recovery from difficult failure conditions. Without it, it is possible
> >> to have data loss or corruption in a split-brain situation.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20151222/25856161/attachment-0003.html>