[ClusterLabs] Position of pacemaker in today's HA world

Fri Oct 5 12:34:10 EDT 2018

On Fri, 2018-10-05 at 13:47 +0200, Jan Pokorný wrote:
> Hello HA enthusiasts,
> 
> I've come by an interesting article on the topic of how high
> availability (possibly, I couldn't witness this first hand since
> I don't have a time machine, but some of you can perhaps comment
> if the picture matches own experience) historically evolved
> from the perspective of database engines.  In part, it may be
> a promo for a particular product but this is in no way an attempt
> to endorse it -- the text comes informative on its own merit:
> 
> <https://www.cockroachlabs.com/blog/brief-history-high-availability/>
> 
> It perpetuates how the first, easy step towards "ideal HA" was
> an active-passive setup, moreover with statefull resources (like DBs)
> first using synchronous replication of the state and hence their
> overall availability relying on the backup being functional,
> then asynchronously, allowing for losing bits.
> (Note that any non-trivial application will always require some
> notion of rather persistent state -- as mentioned several times
> in this venue, stateless services do not need to bother with all
> the "HA coordination" burden since there are typically light-weight
> alternatives for "bring services up in this order" type of tasks,
> hence I explicitly exclude them from further discussion).
> 
> Then it talks about "sharding" (I must admit I haven't heard this
> term before), splitting a two-node active-passive monolith into
> multiple active-passive pairs, using some domain-specific cuts
> (like primary key ranges for tables in DB) + some kind of gateway
> in front of them and used to route the requests to the corresponding
> pair.
> 
> Finally, the evolution brought us to active-active setups, that
> typically solve the consistency issues amongst partly independent
> nodes with after-the-fact conflict reconciliation.  Alternative
> to this is an before-the-fact consensus negotiation on what the
> next "true" shared state will be -- they call this arrangement
> multi-active in the arrangement, and apparently, it means that
> the main mechanisms, membership and consensus, of corosync-pacemaker
> stack are duplicated privately on this resource level.
> 
> * * *
> 
> This brings me to what I want to discuss -- relevancy of
> corosync-pacemaker clusters in the light of increasingly common
> resource-level "private" clustering (amongst other trends like
> a push towards containerization), and how to perhaps rearticulate
> it's mission to stay relevant for years to come.
> 
> I perceive the pacemaker's biggest value currently in:
> 
> * HA-fying plain non-distributed services, either as active-passive
>   or even active-active provided that "shared state" problem is
>   either non-existent or off-loaded elsewhere -- distributed file
>   system/storage, distributed DB, etc.
> 
> * helping in the "last mile" for multiple-actors-ready active-passive
>   services (matches multi-role resource agent arrangement)
> 
> * multisite/cluster-of-clusters handling in combination with booth
> 
> and their (almost) arbitrarily complex combinations, all while
> achieving proper sanity through node-level isolation should the
> HA-damaging failures occur.

Agreed, though I'd put node-level isolation as the number one
advantage. It's the main ingredient missing from resource-level HA.

If by complex combinations you mean pacemaker's constraints, I'd put
that as the number two advantage, because it allows expression of
complex inter-resource dependencies, which obviously a single resource
can't do by itself at the resource level.

> On the other hand, with a standalone self-clustering resources
> (at best, they could reuse the facilities of corosync alone for
> their function), perhaps the only value added would be this
> "isolation" part, but then, stonith-ng/pacemaker-fenced together
> with static configuration file would be all that's needed so that
> such resource can hook into it.  Note that both "sharding

A standalone fence daemon is an interesting idea that was tentatively
tried at one point but abandoned for lack of interest. I think it could
greatly help resource-level HA techniques if there were a de facto
standard for that, but there doesn't seem to be any interest from those
communities, other than DLM. Partly they don't understand the value of
fencing, but partly it requires a considerable amount of additional
effort for something outside their core strength.

> gateway/router", conflict reconciliation and perhaps even consensus
> negotiation appear to be highly application specific.  To be
> relevant in those contexts, the opposite to "external wrapping"
> would be needed -- making the framework complete, offering the
> library/API so that the applications are built on top of this
> natively.  An example of this I've seen when doing a brief research
> some time ago is <https://github.com/NetComposer/nkcluster>
> (on that note, Erlang was designed specifically with fault-tolerant,
> resilient and distributed applications in mind, making me wonder
> if it was ever considered originally in what later became pacemaker).

Not that I'm aware of. The first problem that comes to mind is the
ecosystem; I don't think there's the breadth of library support that
pacemaker needs, nor a large enough community of users. My only
personal encounters with erlang have been via rabbit-mq, and that
hasn't been positive (high resource usage and questionable
reliability).

Personally, I'm not fond of functional programming in general. I find
it to be the veganism of programming languages. (No disrespect for
anyone else's choice, but it's not for me ...)

> Also, one of the fields where pacemaker used to be very helpful was
> a concise startup/shutdown ordering amongst multiple on-node
> services.  This is partially obviated with smart init managers, most
> blatantly systemd on Linux platform, playing whole another league
> than old, inflexible init systems of the past when the foundation
> of pacemaker was laid out.

Agreed, though those still lack cross-node dependencies

> 
> * * *
> 
> Please, don't take this as a blasphemy, I am just trying to put my

I think the only blasphemy in any tech field is not looking ahead

> head out of the tunnel (or sand, if you want), to view the value of
> corosync-pacemaker stack in the IT infrastructures of today and
> future,
> and to gather feedback on this topic, perhaps together with ideas how
> to stay indeed relevant amongst all the "private clustering",
> management and orchestration of resources proliferation we
> can observe for the past years (which makes the surface slightly
> different than it was when heartbeat [and Red Hat Cluster Suite]
> was the thing).
> 
> Please share your thoughts with me/us, even if it will not be
> the most encouraging thing to hear, since
> - staying realistic is important
> - staying relevant is what prevents becoming a fossil tomorrow
> :-)
> 
> Happy World Teacher's Day.

I think the biggest challenge today is that the ClusterLabs stack is
infrastructure, and infrastructure is increasingly concentrated in a
small number of cloud providers. Fewer organizations of all sizes have
their own data center space. The danger is a shrinking community base
and ever greater dependence on a handful of backers. Thankfully, we've
seen the opposite in the past few years -- the community has grown. I
think the consolidation of cluster stacks and improvement of the Debian
packages have helped in that regard. There's been a trickling of
interest from *BSD and Arch users, and I think encouraging that however
we can would be helpful.

The next big challenge is that high availability is becoming a subset
of the "orchestration" space in terms of how we fit into IT
departments. Systemd and Kubernetes are the clear leaders in service
orchestration today and likely will be for a long while. Other forms of
orchestration such as Ansible are also highly relevant. Tighter
integration with these would go a long way toward establishing
longevity.

That brings up another challenge, which is developer resources. It is
difficult to keep up with triaging bug reports much less handling them.
We occasionally have the opportunity to add significant new
enhancements (bundles and alerts come to mind), but the size and
frequency of such projects are greatly limited. It would be awesome to
adapt Pacemaker to be a Kubernetes scheduler, or a cross-node
dependency manager for systemd, but a whole new team of developers
would be needed to tackle something that size. And if we're asking
Santa Claus, I'd like two or three additional full-time developers to
focus on known bugs and RFEs ...
-- 
Ken Gaillot <kgaillot at redhat.com>