[ClusterLabs] Ansible role to configure Pacemaker

Jan Pokorný jpokorny at redhat.com
Thu Jun 7 00:26:01 UTC 2018


On 07/06/18 02:19 +0200, Jan Pokorný wrote:
> While I see why Ansible is compelling, I feel it's important to
> challenge this trend of trying to bend/rebrand _machine-local
> configuration management tool_ as _distributed system management tool_
> (pacemaker is distributed application/framework of sorts), which Ansible
> alone is _not_, as far as I know, hence the effort doesn't seem to be
> 100% sound (which really matters if reliability is the goal).
> 
> Once more, this has nothing to do with the announced project, it's
> just the trending fuss on this topic that indicates me that people
> independently, as they keenly invent their own wheel (here: Ansible
> roles), get blind to the fallacy everything must work nicely with
> multi machine shared-state scenarios like they are used to with
> single host bootstrapping, without any shortcomings.
> 
> But there are, and precisely because not the optimal tool for the
> task gets selected!  Just imagine what would happen if a single
> machine got configured independently with multiple Ansible actors
> (there may be mechanisms -- relatively easy within the same host --
> that would prevent such interferences, but assume now they are not
> strong enough).  What will happen?  Likely some mess-ups will occur as
> glorified idempotence is hard to achieve atomically.  Voila, inflicted
> race conditions, one by one, get exercised, until there's enough of
> bad luck that the rule of idempotence gets broken, just because of
> these processes emulating a schizophrenic (at the same time
> multitasking) admin.  Ouch!
> 
> Now, reflect This to the situation with possibly concurrent
> cluster configuration.  One cannot really expect the cluster
> stack to be bullet-proof against these sorts of mishandling.
> Single cluster administrator operating at a time?  Ideal!
> Few administrators presumably with separate areas of
> configuration interest?  Pacemaker is quite ready.
> Cluster configuration randomly touched from random node
> at random time (equivalent of said schizophrenic multitasking
> administrator with a single host)?  Chances are off in
> sufficiently long period when this happens.
> 
> The solution here is to break that randomness, configuration
> is modified either:
> 1. from a single node at a time in the cluster (plus preferrably
>    batching all required changes into a single request)
> 2, mutual time-critical exclusion of triggering the changes
>    across the nodes
> 3. mutual locality-critical exclusion in the subject of the
>    changes initiated from particular nodes
> 
> Putting 1. and 3. aside as not very interesting (1. means
> a degenerate case with single point of failure, and 3. kills
> the universality), what we get is really a dependency on some
> kind of distributed lock and/or transactional system.
> Well, we have just discovered that what we need to automate our
> predestined configuration in the cluster reliably and without
> hurting universality (like "breaking the node symmetry") is
> said distributed system management ("orchestration") tool.
> Has Ansible these capabilities?
> 
> Now, one idea there might be to make the tools like pcs compensate
> for these shortcomings of machine-local configuration management ones.
> Sounds good, right?  Absolutely not, more like a bad joke!
> Because what else can it be, the development of orchestration-like
> features (with all the complexities solved once in corosync/DLM
> already; relaxing non-dependency on the very subject of management
> may not be wise) on top of regular high-level cluster management tool
> only[*] to bridge the gap in something that is simply subpar fit
> in distributed environments to begin with?
> 
> As Czech proverb puts it: think twice, act once.

I've meant to link this rather nice taxonomy regarding configuration
management and distributed systems, to show there's more than a single
tunnel-limited panacea:
https://www.xkyle.com/the-evolution-of-distributed-systems-management/

> [*] non-automated/human-triggered usage is generally fine as it's
>     highly unlikely none of 1.-3. would be satisfied, so there
>     would be next to no gain for these workflows

-- 
Poki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180607/5c26905b/attachment.sig>


More information about the Users mailing list