[ClusterLabs] Updated attribute is not displayed in crm_mon
Ken Gaillot
kgaillot at redhat.com
Tue Aug 15 13:37:45 EDT 2017
On Tue, 2017-08-15 at 08:42 +0200, Jan Friesse wrote:
> Ken Gaillot napsal(a):
> > On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote:
> >> On Wed, 2017-08-02 at 09:59 +0000, 井上 和徳 wrote:
> >>> Hi,
> >>>
> >>> In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not displayed in crm_mon.
> >>> In Pacemaker-1.1.16, it is displayed and results are different.
> >>>
> >>> https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
> >>> This commit is the cause, but the following result (3.) is expected behavior?
> >>
> >> This turned out to be an odd one. The sequence of events is:
> >>
> >> 1. When the node leaves the cluster, the DC (correctly) wipes all its
> >> transient attributes from attrd and the CIB.
> >>
> >> 2. Pacemaker is newly started on the node, and a transient attribute is
> >> set before the node joins the cluster.
> >>
> >> 3. The node joins the cluster, and its transient attributes (including
> >> the new value) are sync'ed with the rest of the cluster, in both attrd
> >> and the CIB. So far, so good.
> >>
> >> 4. Because this is the node's first join since its crmd started, its
> >> crmd wipes all of its transient attributes again. The idea is that the
> >> node may have restarted so quickly that the DC hasn't yet done it (step
> >> 1 here), so clear them now to avoid any problems with old values.
> >> However, the crmd wipes only the CIB -- not attrd (arguably a bug).
> >
> > Whoops, clarification: the node may have restarted so quickly that
> > corosync didn't notice it left, so the DC would never have gotten the
>
> Corosync always notice left of node no matter if left is longer or
> within token timeout.
Looking back at the original commit, it has a comment "OpenAIS has a
nasty habit of not being able to tell if a node is returning or didn't
leave in the first place", so it looks like it's only relevant on legacy
stacks.
>
> > "peer lost" message that triggers wiping its transient attributes.
> >
> > I suspect the crmd wipes only the CIB in this case because we assumed
> > attrd would be empty at this point -- missing exactly this case where a
> > value was set between start-up and first join.
> >
> >> 5. With the older pacemaker version, both the joining node and the DC
> >> would request a full write-out of all values from attrd. Because step 4
> >> only wiped the CIB, this ends up restoring the new value. With the newer
> >> pacemaker version, this step is no longer done, so the value winds up
> >> staying in attrd but not in CIB (until the next write-out naturally
> >> occurs).
> >>
> >> I don't have a solution yet, but step 4 is clearly the problem (rather
> >> than the new code that skips step 5, which is still a good idea
> >> performance-wise). I'll keep working on it.
> >>
> >>> [test case]
> >>> 1. Start pacemaker on two nodes at the same time and update the attribute during startup.
> >>> In this case, the attribute is displayed in crm_mon.
> >>>
> >>> [root at node1 ~]# ssh -f node1 'systemctl start pacemaker ; attrd_updater -n KEY -U V-1' ; \
> >>> ssh -f node3 'systemctl start pacemaker ; attrd_updater -n KEY -U V-3'
> >>> [root at node1 ~]# crm_mon -QA1
> >>> Stack: corosync
> >>> Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> >>>
> >>> 2 nodes configured
> >>> 0 resources configured
> >>>
> >>> Online: [ node1 node3 ]
> >>>
> >>> No active resources
> >>>
> >>>
> >>> Node Attributes:
> >>> * Node node1:
> >>> + KEY : V-1
> >>> * Node node3:
> >>> + KEY : V-3
> >>>
> >>>
> >>> 2. Restart pacemaker on node1, and update the attribute during startup.
> >>>
> >>> [root at node1 ~]# systemctl stop pacemaker
> >>> [root at node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U V-10
> >>>
> >>>
> >>> 3. The attribute is registered in attrd but it is not registered in CIB,
> >>> so the updated attribute is not displayed in crm_mon.
> >>>
> >>> [root at node1 ~]# attrd_updater -Q -n KEY -A
> >>> name="KEY" host="node3" value="V-3"
> >>> name="KEY" host="node1" value="V-10"
> >>>
> >>> [root at node1 ~]# crm_mon -QA1
> >>> Stack: corosync
> >>> Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> >>>
> >>> 2 nodes configured
> >>> 0 resources configured
> >>>
> >>> Online: [ node1 node3 ]
> >>>
> >>> No active resources
> >>>
> >>>
> >>> Node Attributes:
> >>> * Node node1:
> >>> * Node node3:
> >>> + KEY : V-3
> >>>
> >>>
> >>> Best Regards
> >>>
> >>> _______________________________________________
> >>> Users mailing list: Users at clusterlabs.org
> >>> http://lists.clusterlabs.org/mailman/listinfo/users
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >
>
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list