[ClusterLabs] Ansible role to configure Pacemaker
jpokorny at redhat.com
Wed Jun 6 20:26:01 EDT 2018
On 07/06/18 02:19 +0200, Jan Pokorný wrote:
> While I see why Ansible is compelling, I feel it's important to
> challenge this trend of trying to bend/rebrand _machine-local
> configuration management tool_ as _distributed system management tool_
> (pacemaker is distributed application/framework of sorts), which Ansible
> alone is _not_, as far as I know, hence the effort doesn't seem to be
> 100% sound (which really matters if reliability is the goal).
> Once more, this has nothing to do with the announced project, it's
> just the trending fuss on this topic that indicates me that people
> independently, as they keenly invent their own wheel (here: Ansible
> roles), get blind to the fallacy everything must work nicely with
> multi machine shared-state scenarios like they are used to with
> single host bootstrapping, without any shortcomings.
> But there are, and precisely because not the optimal tool for the
> task gets selected! Just imagine what would happen if a single
> machine got configured independently with multiple Ansible actors
> (there may be mechanisms -- relatively easy within the same host --
> that would prevent such interferences, but assume now they are not
> strong enough). What will happen? Likely some mess-ups will occur as
> glorified idempotence is hard to achieve atomically. Voila, inflicted
> race conditions, one by one, get exercised, until there's enough of
> bad luck that the rule of idempotence gets broken, just because of
> these processes emulating a schizophrenic (at the same time
> multitasking) admin. Ouch!
> Now, reflect This to the situation with possibly concurrent
> cluster configuration. One cannot really expect the cluster
> stack to be bullet-proof against these sorts of mishandling.
> Single cluster administrator operating at a time? Ideal!
> Few administrators presumably with separate areas of
> configuration interest? Pacemaker is quite ready.
> Cluster configuration randomly touched from random node
> at random time (equivalent of said schizophrenic multitasking
> administrator with a single host)? Chances are off in
> sufficiently long period when this happens.
> The solution here is to break that randomness, configuration
> is modified either:
> 1. from a single node at a time in the cluster (plus preferrably
> batching all required changes into a single request)
> 2, mutual time-critical exclusion of triggering the changes
> across the nodes
> 3. mutual locality-critical exclusion in the subject of the
> changes initiated from particular nodes
> Putting 1. and 3. aside as not very interesting (1. means
> a degenerate case with single point of failure, and 3. kills
> the universality), what we get is really a dependency on some
> kind of distributed lock and/or transactional system.
> Well, we have just discovered that what we need to automate our
> predestined configuration in the cluster reliably and without
> hurting universality (like "breaking the node symmetry") is
> said distributed system management ("orchestration") tool.
> Has Ansible these capabilities?
> Now, one idea there might be to make the tools like pcs compensate
> for these shortcomings of machine-local configuration management ones.
> Sounds good, right? Absolutely not, more like a bad joke!
> Because what else can it be, the development of orchestration-like
> features (with all the complexities solved once in corosync/DLM
> already; relaxing non-dependency on the very subject of management
> may not be wise) on top of regular high-level cluster management tool
> only[*] to bridge the gap in something that is simply subpar fit
> in distributed environments to begin with?
> As Czech proverb puts it: think twice, act once.
I've meant to link this rather nice taxonomy regarding configuration
management and distributed systems, to show there's more than a single
> [*] non-automated/human-triggered usage is generally fine as it's
> highly unlikely none of 1.-3. would be satisfied, so there
> would be next to no gain for these workflows
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 819 bytes
Desc: not available
More information about the Users