[ClusterLabs] Updated attribute is not displayed in crm_mon
井上 和徳
inouekazu at intellilink.co.jp
Thu Aug 17 09:05:13 CEST 2017
I confirmed that the problem was fixed.
Many thanks!
> -----Original Message-----
> From: Ken Gaillot [mailto:kgaillot at redhat.com]
> Sent: Thursday, August 17, 2017 12:25 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> Subject: Re: [ClusterLabs] Updated attribute is not displayed in crm_mon
>
> I have a fix for this issue ready. I am running some tests on it, then
> will merge it in the upstream master branch, to become part of the next
> release.
>
> The fix is to clear the transient attributes from the CIB when attrd
> starts, rather than when the crmd completes its first join. This
> eliminates the window where attributes can be set before the CIB is
> cleared.
>
> On Tue, 2017-08-15 at 08:42 +0000, 井上 和徳 wrote:
> > Hi Ken,
> >
> > Thanks for the explanation.
> >
> > As an additional information, we are using Daemon(*1) that registers
> > Corosync's ring status as attributes, so I want to avoid events where
> > attributes are not displayed.
> >
> > *1 It's a ifcheckd that always running, not a resource. and registers
> > attributes when Pacemaker is running.
> > ( https://github.com/linux-ha-japan/pm_extras/tree/master/tools )
> > Attribute example :
> >
> > Node Attributes:
> > * Node rhel73-1:
> > + ringnumber_0 : 192.168.101.131 is UP
> > + ringnumber_1 : 192.168.102.131 is UP
> > * Node rhel73-2:
> > + ringnumber_0 : 192.168.101.132 is UP
> > + ringnumber_1 : 192.168.102.132 is UP
> >
> > Regards,
> > Kazunori INOUE
> >
> > > -----Original Message-----
> > > From: Ken Gaillot [mailto:kgaillot at redhat.com]
> > > Sent: Tuesday, August 15, 2017 2:42 AM
> > > To: Cluster Labs - All topics related to open-source clustering welcomed
> > > Subject: Re: [ClusterLabs] Updated attribute is not displayed in crm_mon
> > >
> > > On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote:
> > > > On Wed, 2017-08-02 at 09:59 +0000, 井上 和徳 wrote:
> > > > > Hi,
> > > > >
> > > > > In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not displayed in crm_mon.
> > > > > In Pacemaker-1.1.16, it is displayed and results are different.
> > > > >
> > > > > https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
> > > > > This commit is the cause, but the following result (3.) is expected behavior?
> > > >
> > > > This turned out to be an odd one. The sequence of events is:
> > > >
> > > > 1. When the node leaves the cluster, the DC (correctly) wipes all its
> > > > transient attributes from attrd and the CIB.
> > > >
> > > > 2. Pacemaker is newly started on the node, and a transient attribute is
> > > > set before the node joins the cluster.
> > > >
> > > > 3. The node joins the cluster, and its transient attributes (including
> > > > the new value) are sync'ed with the rest of the cluster, in both attrd
> > > > and the CIB. So far, so good.
> > > >
> > > > 4. Because this is the node's first join since its crmd started, its
> > > > crmd wipes all of its transient attributes again. The idea is that the
> > > > node may have restarted so quickly that the DC hasn't yet done it (step
> > > > 1 here), so clear them now to avoid any problems with old values.
> > > > However, the crmd wipes only the CIB -- not attrd (arguably a bug).
> > >
> > > Whoops, clarification: the node may have restarted so quickly that
> > > corosync didn't notice it left, so the DC would never have gotten the
> > > "peer lost" message that triggers wiping its transient attributes.
> > >
> > > I suspect the crmd wipes only the CIB in this case because we assumed
> > > attrd would be empty at this point -- missing exactly this case where a
> > > value was set between start-up and first join.
> > >
> > > > 5. With the older pacemaker version, both the joining node and the DC
> > > > would request a full write-out of all values from attrd. Because step 4
> > > > only wiped the CIB, this ends up restoring the new value. With the newer
> > > > pacemaker version, this step is no longer done, so the value winds up
> > > > staying in attrd but not in CIB (until the next write-out naturally
> > > > occurs).
> > > >
> > > > I don't have a solution yet, but step 4 is clearly the problem (rather
> > > > than the new code that skips step 5, which is still a good idea
> > > > performance-wise). I'll keep working on it.
> > > >
> > > > > [test case]
> > > > > 1. Start pacemaker on two nodes at the same time and update the attribute during startup.
> > > > > In this case, the attribute is displayed in crm_mon.
> > > > >
> > > > > [root at node1 ~]# ssh -f node1 'systemctl start pacemaker ; attrd_updater -n KEY -U V-1' ; \
> > > > > ssh -f node3 'systemctl start pacemaker ; attrd_updater -n KEY -U V-3'
> > > > > [root at node1 ~]# crm_mon -QA1
> > > > > Stack: corosync
> > > > > Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> > > > >
> > > > > 2 nodes configured
> > > > > 0 resources configured
> > > > >
> > > > > Online: [ node1 node3 ]
> > > > >
> > > > > No active resources
> > > > >
> > > > >
> > > > > Node Attributes:
> > > > > * Node node1:
> > > > > + KEY : V-1
> > > > > * Node node3:
> > > > > + KEY : V-3
> > > > >
> > > > >
> > > > > 2. Restart pacemaker on node1, and update the attribute during startup.
> > > > >
> > > > > [root at node1 ~]# systemctl stop pacemaker
> > > > > [root at node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U V-10
> > > > >
> > > > >
> > > > > 3. The attribute is registered in attrd but it is not registered in CIB,
> > > > > so the updated attribute is not displayed in crm_mon.
> > > > >
> > > > > [root at node1 ~]# attrd_updater -Q -n KEY -A
> > > > > name="KEY" host="node3" value="V-3"
> > > > > name="KEY" host="node1" value="V-10"
> > > > >
> > > > > [root at node1 ~]# crm_mon -QA1
> > > > > Stack: corosync
> > > > > Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum
> > > > >
> > > > > 2 nodes configured
> > > > > 0 resources configured
> > > > >
> > > > > Online: [ node1 node3 ]
> > > > >
> > > > > No active resources
> > > > >
> > > > >
> > > > > Node Attributes:
> > > > > * Node node1:
> > > > > * Node node3:
> > > > > + KEY : V-3
> > > > >
> > > > >
> > > > > Best Regards
> > > > >
> > > > > _______________________________________________
> > > > > Users mailing list: Users at clusterlabs.org
> > > > > http://lists.clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > Project Home: http://www.clusterlabs.org
> > > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > > Bugs: http://bugs.clusterlabs.org
> > > >
> > >
> > > --
> > > Ken Gaillot <kgaillot at redhat.com>
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Users mailing list: Users at clusterlabs.org
> > > http://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> --
> Ken Gaillot <kgaillot at redhat.com>
>
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list