[ClusterLabs] attrd/cib out of sync, master scores not updated in CIB after cluster partition/rejoin

Lars Ellenberg lars.ellenberg at linbit.com
Thu Sep 10 05:55:20 EDT 2020


Hi there.

I've seen a scenario where a network "hickup" isolated the current DC in a 3
node cluster for a short time; other partition elected a new DC obviously, and
all node attributes of the former DC are "cleared" together with the rest of
its state.

All nodes rejoin, "all happy again", BUT ...
the attrd of the former DC apparently had some cached node attribute values,
which are now no longer present in the cib.
Specifically, some master scores.
So the master scores for the former DC (that was lost, then rejoined) are now
"only" in its attrd, but (as long as they don't change) will never be flushed
to the CIB.

The policy engine therefore no longer considers this node as a possible
promotion candidate.

Again: the master score did not change, not from the perspective of the attrd
on the node which was isolated for a short time, anyways.

But since that node "left", the two-node partition deleted the node state of
the "lost" node (including master scores).
Then that node rejoined.

Now, I have a cib without that master score, an attrd with that master score
value still "cached", and some periodic monitor that will just reset this same
(already cached in attrd) master score.
But that apparently will never reach the CIB.

So.
Question is: anyone seen anything like that before?
Could that be fixed already?
Version in that scenario was: 1.1.20+ (almost .21).

Obviously "stonith" would have fixed it,
then that node would not have just rejoined, but rebooted, then rejoined,
and its attrd would not have any cached values anymore ;-)

I suppose attrd attributes should sync with the last CIB on re-join?
I'd hope it does something like that already?
If it does nothing yet, then maybe that's the obvious fix.
If it does something, then maybe this boils down to some funky timing issue?

How would I go about trying to create a reproducer?

Thanks,

    Lars



More information about the Users mailing list