[ClusterLabs] attrd/cib out of sync, master scores not updated in CIB after crmd "Respawn" after internal error [NOT cluster partition/rejoin]

Thu Sep 10 10:03:08 EDT 2020

Now with "reproducer" ... see below

On Thu, Sep 10, 2020 at 11:55:20AM +0200, Lars Ellenberg wrote:
> Hi there.
> 
> I've seen a scenario where a network "hickup" isolated the current DC in a 3
> node cluster for a short time; other partition elected a new DC obviously, and
> all node attributes of the former DC are "cleared" together with the rest of
> its state.

I have to correct myself here.
Network and membership remained stable, even the CIB CPG did not notice anything.

But for some unrelated reason (stress on the cib, IPC timeout),
crmd on the DC was doing an error exit and was respawned:

  cib:     info: cib_process_ping:  Reporting our current digest
  crmd:    error: do_pe_invoke_callback:     Could not retrieve the Cluster Information Base: Timer expired
  ...
  pacemakerd:    error: pcmk_child_exit:   The crmd process (17178) exited: Generic Pacemaker error (201)
  pacemakerd:   notice: pcmk_process_exit: Respawning failed child process: crmd

The new DC now causes:
  cib:     info: cib_perform_op:    Diff: --- 0.971.201 2
  cib:     info: cib_perform_op:    Diff: +++ 0.971.202 (null)
  cib:     info: cib_perform_op:    -- /cib/status/node_state[@id='2']/transient_attributes[@id='2']

But the attrd apparently does not notice that transient attributes it had cached are now gone.

Reprobes are going on, and all give the expected results.
But unchanged (from the perspective of the attrd on the former DC,
the one with the crmd Respawn) master scores will not be re-populated
to the CIB, preventing a later switchover of the Master role
(that is when it became apparent that something was wrong).

A "reproducer" in the sense of "reproduces approximate behavior",
even if not the exact scenario (crmd emergency respawn and DC re-election):

 * have a healthy cluster with some master scores set
 * delete transient node attributes:
   cibadm -D --xpath "/cib/status/node_state[@id='2']/transient_attributes[@id='2']"
    (or whatever your node id is; the resource should not be promoted on
    that node at that time, or this will result in resource "recovery"
    actions, which will change the master score, and we have a different effect)

Any cached node attributes (master scores) on that node
will "never" make it to the CIB (until they eventually change their value).

How can this be fixed?
   * for the "cibadmin -D" case? (do we even want to?)
   * for the "DC re-election" and one crmd "temporarily not available"
     case as in the scenario described here?
     (I think we should)

> All nodes rejoin, "all happy again", BUT ...
> the attrd of the former DC apparently had some cached node attribute values,
> which are now no longer present in the cib.
> Specifically, some master scores.
> So the master scores for the former DC (that was lost, then rejoined) are now
> "only" in its attrd, but (as long as they don't change) will never be flushed
> to the CIB.
> 
> The policy engine therefore no longer considers this node as a possible
> promotion candidate.
> 
> Again: the master score did not change, not from the perspective of the attrd
> on the node which was isolated for a short time, anyways.
> 
> But since that node "left", the two-node partition deleted the node state of
> the "lost" node (including master scores).
> Then that node rejoined.
> 
> Now, I have a cib without that master score, an attrd with that master score
> value still "cached", and some periodic monitor that will just reset this same
> (already cached in attrd) master score.
> But that apparently will never reach the CIB.
> 
> So.
> Question is: anyone seen anything like that before?
> Could that be fixed already?
> Version in that scenario was: 1.1.20+ (almost .21).
> 
> Obviously "stonith" would have fixed it,
> then that node would not have just rejoined, but rebooted, then rejoined,
> and its attrd would not have any cached values anymore ;-)
> 
> I suppose attrd attributes should sync with the last CIB on re-join?
> I'd hope it does something like that already?
> If it does nothing yet, then maybe that's the obvious fix.
> If it does something, then maybe this boils down to some funky timing issue?
> 
> How would I go about trying to create a reproducer?
> 
Thanks,

     Lars