[Pacemaker] Release model

Fri Jun 28 06:59:22 EDT 2013

On 2013-06-28T18:41:35, Andrew Beekhof <andrew at beekhof.net> wrote:

> > There's an exception: dropping commonly used external interfaces (say,
> > "ptest") needs to be announced a few releases in advance before enacted
> > upstream. (And if Enterprise distributions want to keep something, they
> > have time to prepare for that.) And of course, if major components get
> > rewritten, they either need more testing or should be in place in
> > parallel for 1 or 2 releases.
> Now we start to diverge...
> 
> Keeping two lrmd's around? Two stonithd's?

Well, I can dream, can't I. ;-) But perhaps you're right. The LRM
rewrite taught us something about the perils of rewriting components
that are badly documented and don't have good regression tests and where
not all options they supported were written down somewhere.

But as an isolated component, would it have been so difficult to ship a
separate implementation of the LRM first, perhaps as a compile time
switch? (Assuming the interface to the component doesn't change so much.
It could hardly have been worse than supporting all those different
messaging APIs and their versions.)

The latter is, perhaps, not a bad example.

> Or two copies of the PE after I rewrite ordering constraints? Urgh :-(

The PE is different; almost all of its features are documented and
protected by strong regressions tests. That support for an option would
be dropped by accident is almost unthinkable. Hence, the implementation
can be considered almost entirely internal.

But people were using options that the new LRM no longer supported,
called lrmadmin in some of their scripts, etc. So I think the
differentiation between the PE and the LRM does exist.

Perhaps the lesson is "Write regression tests before a rewrite." (And
I'm not saying it's a lesson that depended entirely on you or David. If
cluster-glue's LRM had had such a suite, it'd certainly have helped
tons.)

The Linux kernel 3.x series seems to be coping quite nicely, too. They
do have stable series to which they backport, though. That's always an
option: if $someone feels the need to do longer support for, say,
1.1.10, they can always can help start 1.1.10.x.

> If that sort of thing wasn't such a PITA you'd have done it with 1.1.8.

Yeah, and there were some here who advocated this. Given the scope of
the other changes at the time, I thought it better to integrate it via a
different path into SLE HA.

> Which is the problem with the Firefox model - either there is no "good" time to make them, or users hate us because we can make them at any time.

For Firefox, though, I've never noticed a problem (and I'm an ardent
follower of the updates). The exceptions are, of course, add-ons: so I
don't update until the add-ons I depend on are also updated.

> Even broadcasting changes can have limited value.
> To use a recent example, crmsh was left in place for well over a year (iirc) before it was dropped.
> That didn't seem to help anything...

Probably a communication problem.

And the way how we "fixed" this on SLE HA was to pull in the new package
via a dependency, so that users never noticed that we split the
projects. Clearly, that's impossible to do when one chooses to drop a
major component for good.

> > (Perpetuated by customers willing to pay for it, and because admittedly
> > not all components have good test suites.)
> Me too, but how do we do this where all the downside doesn't fall on me?

I'm not sure there's a huge downside in it for you? You'd get to develop
and bring forward pacemaker 2.x all you want - and if RHEL7 wanted to
freeze a specific version, they'd support 2.x.y for that. (OK, so that
would probably be you too, though.)

Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde