[Pacemaker] can we update an attribute with cmpxchg "atomic compare and exchange" semantics?

Lars Ellenberg lars.ellenberg at linbit.com
Mon Sep 29 20:22:06 UTC 2014


On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote:
> 
> Hi Andrew (and others).
> 
> For a certain use case (yes, I'm talking about DRBD "peer-fencing" on
> loss of replication link), it would be nice to be able to say:
> 
>   update some_attribute=some_attribute+1 where some_attribute >= 0
> 
>   delete some_attribute where some_attribute=0
> 
> Ok, that's not the classic cmpxchg(), more of an atomic_add();
> or similar enough. With hopefully just a single cib roundrip.
> 
> 
> Let me rephrase:
> Update attribute "this_is_pink" (for node-X with ID attr-ID):
> 
>   fail if said attr-ID exists elsewhere (not as the intended attribute
>   at the intended place in the xml tree)
> 	(this comes for free already, I think)
>   	
>   if it does not exist at all, assume it was present with current value 0
> 
>   if the current (or assumed current) value is >= 0, add 1
> 
>   if the current value is < 0, fail
> 
>   (optionally: return new value? old value?)

Did anyone read this?

> My intended use case scenario is this:
> 
>   Two DRBD nodes, several DRBD resources,
>   at least a few of them in "dual-primary".
> 
>   Replication link breaks.
> 
>   Fence-peer handlers are triggered individually for each resource on
>   both nodes, and try to concurrently modify the cib (place fencing
>   constraints).
> 
> With the current implementation of crm-fence-peer.sh, it is likely that
> some DRBD resources "win" on one node, some "win" on the other node.
> The respective losers will have their IO blocked.
> 
> Which means that most likely on both nodes some DRBD will stay blocked,
> some monitor operation will soon fail, some stop operation (to recover
> from the monitor fail) will soon fail, and the recovery of that will be
> node-level fencing of the affected node.
> 
> In short: both nodes will be hard-reset
> because of a replication link failure.
> 
> 
> 
> If I would instead use a single attribute (with a pre-determined ID) for all
> instances of the fence-peer handler, the first to come would "chose" the
> victim node, all others would just add their count.
> There will be only one loser, and more importantly: one survivor.
> 
> Once the replication link is re-established,
> DRBD resynchronization will bring the former loser up-to-date,
> and the respective after-resync handlers will decrease that "breakage
> count". Once the breakage count hits zero, it can and should be deleted.
> 
> Presence of the "breakage count" attribute with value > 0 would mean
> "this node must not be promoted", which would be a static constraint
> to be added to all DRBD resources.
> 
> Does that make sense?
> 
> (I have more insane proposals, in case we have multiple (more than 2)
>  Primaries during normal operation, but I'm not yet able to write them
>  down without being seriously confused by myself...)
> 
> 
> I could open-code it with shell and cibadmin, btw.
> I did a proof-of-concept once that does
>   a. cibadmin -Q
>   b. some calculations,
>      then prepares the update statement xml based on cib content seen,
>      *including* the cib generation counters
>   c. cibadmin -R (or -C, -M, -D, as appropriate)
>      this will fail if the cib was modified in a relevant way since a,
>      because of the included generation counters
>   d. repeat as necessary
> 
>  
> But that is beyond ugly.
> And probably fragile.
> And would often fail for all the wrong reasons, just because some status
> code has changed and bumped the cib generation counters.
> 
> What would be needed to add such functionality?
> Where would it go?
> cibadmin? cib? crm_attribute? possibly also attrd?
> 
> Thanks,
> 	Lars
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.




More information about the Pacemaker mailing list