[ClusterLabs] [pacemaker] Discretion with glib v2.59.0+ recommended

Ken Gaillot kgaillot at redhat.com
Mon Feb 11 18:01:23 EST 2019


On Mon, 2019-02-11 at 22:48 +0100, Jan Pokorný wrote:
> On 20/01/19 12:44 +0100, Jan Pokorný wrote:
> > On 18/01/19 20:32 +0100, Jan Pokorný wrote:
> > > It was discovered that this release of glib project changed
> > > sligthly
> > > some parameters of how distribution of values within  hash tables
> > > structures work, undermining pacemaker's hard (alas unfeasible)
> > > attempt
> > > to turn this data type into fully predictable entity.
> > > 
> > > Current impact is unknown beside some internal regression test
> > > failing
> > > due to this, so that, e.g., in the environment variables passed
> > > in the
> > > notification messages, the order of the active nodes (being a
> > > space
> > > separarated list) may be appear shuffled in comparison with the
> > > long
> > > standing (and perhaps making a false impression of determinism)
> > > behaviour witnessed with older versions of glib in the game.
> > 
> > Our immediate response is to, at the very least, make the
> > cts-scheduler regression suite (the only localhost one that was
> > rendered broken with 52 tests out of 733 failed) skip those tests
> > where reliance on the exact order of hash-table-driven items was
> > sported, so it won't fail as a whole:
> > 
> > 
https://github.com/ClusterLabs/pacemaker/pull/1677/commits/15ace890ef0b987db035ee2d71994e37f7eaff96
> > [above edit: updated with the newer version of the patch]
> 
> Shout-out to Ken for fixing the immediate fallout (deterministic
> output breakages in some cts-scheduler tests, making the above
> change superfluous) for the upcoming 2.0.1 release!
> 
> > > Variations like these are expected, and you may take it as an
> > > opportunity to fix incorrect order-wise (like in the stated case)
> > > assumptions.
> > 
> > [intentionally CC'd developers@, should have done it since
> > beginning]
> > 
> > At this point, testing with glib v2.59.0+, preferably using 2.0.1-
> > rc3
> > due to the release cycle timing, is VERY DESIRED if you are
> > considering
> > providing some volunteer capacity to pacemaker project, especially
> > if
> > you have your own agents and scripts that rely on the exact (and
> > previously likely stable) order of "set data made linear, hence
> > artificially ordered", like with
> > OCF_RESKEY_CRM_meta_notify_active_uname
> > environment variable in clone notifications (as was already
> > suggested;
> > complete list is also unknown at this point, unfortunately, for a
> > lack
> > of systemic and precise data items tracking in general).
> 
> While some of these if not all are now ordered, I'd call using
> "stable ordered list" approach to these variable, as opposed to
> "plain unordered set" one, from within agents as continuously
> frowned-upon unless explicitly lifted.  For predictable
> backward/forward pacemaker+glib version compatibility if
> for no other reason.
> 
> Ken, do you agree?
> 
> (If so, we shall keep that in mind for future documentation tweaks
> [possibly including also OCF updates], so no false assumptions won't
> be cast for new agent implementations going forward.)

Correct, the lists given to resource agents via clone notifications
environment variables are not guaranteed to be in any particular
order. 

The documentation already does not claim any ordering, and in fact
gives an example where node names are not in alphabetic order, so I
think it's pretty obvious.

> 
> > > More serious troubles stemming from this expectation-reality
> > > mismatch
> > > regarding said data type cannot be denied at this point, subject
> > > of
> > > further investigation.  When in doubt, staying with glib up to
> > > and
> > > including v2.58.2 (said tests are passing with it, though any
> > > later
> > > v2.58.* may keep working "as always") is likely a good idea for
> > > the
> > > time being.
> 
> It think this still partially holds and only time-proven as fully
> settled?  I mean, for anything truly reproducible (as in
> crm_simulate),
> either pacemaker prior to 2.0.1 combined with glib pre- or equal-or-
> post-
> 2.59.0 need to be uniformly (reproducers need to follow the original)
> combined to get the same results, and with pacemaker 2.0.1+,
> identical
> results (but possibly differing against either of the former combos)
> will _likely_ be obtained regardless of particular run-time linked
> glib
> version, but strength of this "likely" will only be established with
> future experience, I suppose (but shall universally hold with the
> same
> glib class per stated division, so no change in this already positive
> regard).
> 
> Just scratched the surface, so gladly be corrected.
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list