[ClusterLabs] Fwd: Multi cluster

Sat Aug 5 13:11:03 CEST 2017

On 05/08/17 00:10 +0200, Jan Pokorný wrote:
> [addendum inline]

And some more...

> On 04/08/17 18:35 +0200, Jan Pokorný wrote:
>> On 03/08/17 20:37 +0530, sharafraz khan wrote:
>>> I am new to clustering so please ignore if my Question sounds silly, i have
>>> a requirement were in i need to create cluster for ERP application with
>>> apache, VIP component,below is the scenario
>>> 
>>> We have 5 Sites,
>>> 1. DC
>>> 2. Site A
>>> 3. Site B
>>> 4. Site C
>>> 5. Site D
>>> 
>>> Over here we need to configure HA as such that DC would be the primary Node
>>> hosting application & be accessed from by all the users in each sites, in
>>> case of Failure of DC Node, Site users should automatically be switched to
>>> there local ERP server, and not to the Nodes at other sites, so
>>> communication would be as below
>>> 
>>> DC < -- > Site A
>>> DC < -- > Site B
>>> DC < -- > Site C
>>> DC < -- > Site D

Note that if you wanted to imply you generally rely on/are limited with
star-like network topology with a central machine doubling as a relay,
you distort our implicit notion (perhaps we should make it explicit)
of cluster forming a complete graph (directly or indirectly through
multicast) amongst the nodes of the healthy partition (corosync is not
as advanced to support grid/mesh/star topologies, but it's a non-goal
for a direct peer messaging layer to start with).  Sure, you can
workaround this with tunnelling, at the cost of compromising
reliability (and efficiency) and hence high availability :)

With communication site x DC communication _after failure_, do you
mean checking if the DC is OK again or something else?

>>> Now the challenge is
>>> 
>>> 1. If i create a cluster between say DC < -- > Site A it won't allow me to
>>> create another cluster on DC with other sites
>>> 
>>> 2. if i setup all the nodes in single cluster how can i ensure that in case
>>> of Node Failure or loss of connectivity to DC node from any site, users
>>> from that sites should be switched to Local ERP node and not to nodes on
>>> other site.
>>> 
>>> a urgent response and help would be quite helpful
>> 
>> From your description, I suppose you are limited to just a single
>> machine per site/DC (making the overall picture prone to double
>> fault, first DC goes down, then any of the sites goes down, then
>> at least the clients of that very site encounter the downtime).
>> Otherwise I'd suggest looking at booth project that facilitates
>> inter-cluster (back to your "multi cluster") decisions, extending
>> upon pacemaker performing the intra-cluster ones.
>> 
>> Using a single cluster approach, you should certainly be able to
>> model your fallback scenario, something like:
>> 
>> - define a group A (VIP, apache, app), infinity-located with DC
>> - define a different group B with the same content, set up as clone
>>   B_clone being (-infinity)-located with DC
>> - set up ordering "B_clone starts when A stops", of "Mandatory" kind
>> 
>> Further tweaks may be needed.
> 
> Hmm, actually VIP would not help much here, even if "ip" adapted per
> host ("#uname") as there're two conflicting principles ("globality"
> of the network for serving from DC vs. locality when serving from
> particular sites _in parallel_).  Something more sophisticated would
> likely be needed.

Thinking about that more, the easiest solution _configuration-wise_
might be:

* single-real-node cluster at each of the sites, each having
  - disabled fencing (but configure and enable when/if you add more nodes)
  - a unique "prolonged arm" in a form of ocf:pacemaker:remote instance
    to be running at DC (e.g. different ports can be used so as to avoid
    communication clashes at DC)
  - custom resource agent (e.g. customization of ocf:heartbeat:anything)
    that will be just monitoring liveness of ERP application,
    preferring to run at the DC through that very pacemaker-remote
  - a unique IP (hostname mapping) which will be used by the users
    of that very site to access the application
  - VIP configured to represent that very IP, collocated with
    the said monitoring agent, meaning that VIP will follow it,
    i.e. will only run where the ERP application is running,
    primarily will stay at DC, with the fallback to the home site

There are many adjustments possible on top of this sketch, which is
furthermore totally untested.  Advantage is that you may add new nodes
at each site so as to achieve per-site HA.  (But When also DC will get
clustered with additional nodes, then booth will likely be a way
forward.)

Note that to avoid race conditions, pacemaker-remote instances at DC
should not try to control the ERP application directly, but it should
rather be set for autonomous recovery (most simple restart-after-failure
case can be achieved, for instance, through service file directives
of systemd if that's how it gets launched).

This is definitely quite atypical workload from what I've seen so far,
and is not easy to wrap my head around.

-- 
Poki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170805/d3f4153a/attachment.sig>