[Pacemaker] dopd on openais

Tue Jun 9 09:25:53 EDT 2009

On 2009-06-08T10:20:41, Lars Ellenberg <lars.ellenberg at linbit.com> wrote:

> > With drbd supporting active/active or active/passive, for example, the
> > CRM/RA can't reliably tell whether the number of activated nodes is
> > corrected (and this will get worse if >2 nodes ever are supported),
> > without resorting to parse drbd's configuration file, which is icky (and
> > relies on the configfile being identical on all nodes).
> 
> first, this was about the "dopd" functionality.
> that is, to reduce the risk of going online with consistent but
> out-of-date, stale, data, because of replication link failure.
> 
> you seem to at least mix a few other wishlist items in here.

Well, I see dopd as an out-of-band (compared to the RA) meta-data
management daemon, which could (eventually) include additional
functionality.

> second, I don't think it is the RAs job to force a particular
> configuration.  If you configure "master-max=2" in the cib, and the drbd
> config does not allow two primaries, promoting the second master will
> fail.  so what.  the other way around, drbd.conf allowing two, but cib
> only one master: no harm done at all.

One could argue that there should be a different exit code returned
(ERR_INSTALLLED, ERR_CONFIGURED, and ERR_GENERIC) depending on the cause
for the error.

> I don't see the problem.

Consider the "validate" functionality. It'd be nice to provide feedback
to users w/o enforcing a restart.

It's not a big problem, and indeed, the CIB config being more strict
than the drbd one would not be an error at all.

> > And also this would reduce the amount of configuration necessary - ie,
> > if the IP addresses were inherited from the OpenAIS configuration. (By
> > default; of course this could be overridden.)
> uh?

What's the problem with the previous paragraph? I thought that one was
the most clear one ;-)

> > With internal meta-data, for example, one could then simply say: "start
> > device XXX".
> you can say that now?
> or am I misunderstanding you again?

I perhaps should have written "instantiate drbd rsc <id> on <device>" to
cover my intent.

> again nothing to do with "fence peer", mark outdated .

Sure. But you _did_ ask me for wishlist items for future OpenAIS
integration ;-)

> > There could be a start-or-clone command too (maybe even the default?)
> > which would do the right thing (either resync if a copy already existed
> > or do a full clone), easing recovery of failed nodes.
> I don't follow.
> what is it you want to achieve?

Basically do away with drbd.conf eventually.

> > And if the configuration is distributed using OpenAIS, doing a "drbdadm
> > configure change syncer-speed 10M" would immediately affect all nodes
> > w/o needing to manually modify drbd.conf everywhere.
> if you want to "distribute" the drbd config file, use csync2.
> 
> if you want get rid of the drbd config file, store everything necessary
> in the cib, then rewrite the drbd RA to get all parameters passed in,
> drop drbdadm and use drbdsetup directly.

> absolutely no need to write yet an other distributed config based on
> openais (or other) thingy, when we already have one: the cib.

True, a CIB integration would also work, but you'd need local persistent
state, which the CIB doesn't provide that easily. And storing the
meta-data with/on the backing device reduces the likelihood for errors
when the devices get migrated between nodes and ensures consistency even
on the local node.

Sure, the CIB could be used for some of the ideas. I was mostly
brainstorming use cases, not final implementation ideas ;-)

> > Eventually, such a distributed user-space daemon could also allow you to
> > shift the meta-data handling and processing from the kernel. Might
> > appeal to some. ;-)
> exactly who shall be authoritative on which replica contains the
> most up-to-date data generation?

The meta-data handling system, obviously. Right now you implement it in
kernel, but it could just as well be in said meta-data daemon. 

And such status meta-data really shouldn't be funneled through the CIB.
OCFS2 escalates plocks to user-space, and uses checkpoints/ordered
messaging for synchronizing them. The CIB doesn't provide all these
guarantees, and also not the performance, which may or may not be a
problem.

> > > what we actually are doing right now is placing location constraints on
> > > the master role into the cib from the "fence-peer" handler, and removing
> > > them again from the "after-sync-target" handler.  sort of works.
> > 
> > Here I think we need a more extensive discussion. Why doesn't your
> > handler modify the master score instead, but add additional
> > constraints?
> 
> why not.
> could easily be changed.
> 
> but there is even a technical reason: shoot-out in two-primary.
> both try to create the "no-one but me is uptodate" constraint,
> with the same xml id.  the first one to place it succeeds.

Ah, interesting approach. I hadn't forseen that use case - cool idea.

Regards,
    Lars

-- 
SuSE Labs, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde