[ClusterLabs] Default resource stickiness issue with colocation constraint
Ken Gaillot
kgaillot at redhat.com
Tue Mar 31 10:38:53 EDT 2020
On Tue, 2020-03-31 at 07:37 +0300, Strahil Nikolov wrote:
> On March 31, 2020 6:01:35 AM GMT+03:00, Ken Gaillot <
> kgaillot at redhat.com> wrote:
> > On Sun, 2020-03-08 at 18:11 +0000, Strahil Nikolov wrote:
> > > Hello All,
> > >
> > > can someone help me figure something out.
> > >
> > > I have a test cluster with 2 resource groups:
> > >
> > > [root at node3 cluster]# pcs status
> > > Cluster name: HACLUSTER16
> > > Stack: corosync
> > > Current DC: node3.localdomain (version 1.1.20-5.el7_7.2-
> > > 3c4c782f70) -
> > > partition with quorum
> > > Last updated: Sun Mar 8 20:00:48 2020
> > > Last change: Sun Mar 8 20:00:04 2020 by root via cibadmin on
> > > node3.localdomain
> > >
> > > 3 nodes configured
> > > 14 resources configured
> > >
> > > Node node2.localdomain: standby
> > > Node node3.localdomain: standby
> > > Online: [ node1.localdomain ]
> > >
> > > Full list of resources:
> > >
> > > RHEVM (stonith:fence_rhevm): Started node1.localdomain
> > > MPATH (stonith:fence_mpath): Started node1.localdomain
> > > Resource Group: NFS
> > > NFS_LVM (ocf::heartbeat:LVM): Started node1.localdomain
> > > NFS_infodir (ocf::heartbeat:Filesystem): Started
> > > node1.localdomain
> > > NFS_data (ocf::heartbeat:Filesystem): Started
> > > node1.localdomain
> > > NFS_IP (ocf::heartbeat:IPaddr2): Started
> > > node1.localdomain
> > > NFS_SRV (ocf::heartbeat:nfsserver): Started
> > > node1.localdomain
> > > NFS_XPRT1 (ocf::heartbeat:exportfs): Started
> > > node1.localdomain
> > > NFS_NTFY (ocf::heartbeat:nfsnotify): Started
> > > node1.localdomain
> > > Resource Group: APACHE
> > > APACHE_LVM (ocf::heartbeat:LVM): Started node1.localdomain
> > > APACHE_cfg (ocf::heartbeat:Filesystem): Started
> > > node1.localdomain
> > > APACHE_data (ocf::heartbeat:Filesystem): Started
> > > node1.localdomain
> > > APACHE_IP (ocf::heartbeat:IPaddr2): Started
> > > node1.localdomain
> > > APACHE_SRV (ocf::heartbeat:apache): Started
> > > node1.localdomain
> > >
> > > The constraints I have put are:
> > >
> > > [root at node3 cluster]# pcs constraint
> > > Location Constraints:
> > > Resource: APACHE
> > > Enabled on: node1.localdomain (score:3000)
> > > Enabled on: node2.localdomain (score:2000)
> > > Enabled on: node3.localdomain (score:1000)
> > > Resource: NFS
> > > Enabled on: node1.localdomain (score:1000)
> > > Enabled on: node2.localdomain (score:2000)
> > > Enabled on: node3.localdomain (score:3000)
> > > Ordering Constraints:
> > > Colocation Constraints:
> > > APACHE with NFS (score:-1000)
> > > Ticket Constraints:
> > >
> > > [root at node3 cluster]# pcs resource defaults
> > > resource-stickiness=1000
> > >
> > > As you can see the default stickiness is 1000 per resource or
> > > 7000
> > > for the APACHE group.
> > > The colocation rule score is just -1000 and as per my
> > > understanding
> > > it should be ignored when the 2 nodes are removed from standby.
> > >
> > > Can someone clarify why the APACHE group is moved , when the
> > > resource
> > > stickiness score is higher than the colocation score.
> > >
> > > I have attached a file with the crm_simulate output (the output
> > > is
> > > correct, when the standby is removed - the group is moved).
> > >
> > > Best Regards,
> > > Strahil Nikolov
> >
> > Coincidentally I just fixed a bug last week that I believe is the
> > culprit here. I expect if you test the current master branch it
> > won't
> > happen. The fix will be in 2.0.4 (the first release candidate is
> > expected in a couple of weeks).
> >
> > The problem was in the code that incorporates colocation
> > dependencies'
> > node preferences. If a group was colocated with some resource, the
> > resource would incorporate the scores from each member of the group
> > in
> > turn. However each member of the group would also incorporate its
> > own
> > dependencies' scores in its score -- which includes the internal
> > group
> > colocation of all members after it. So, the members of the
> > colocated
> > group were being counted multiple times, and therefore having a
> > bigger
> > impact than the configured colocation score. The fix was just to
> > incorporate scores from the first group member since it would
> > incorporate all the rest.
>
> Hey Ken,
>
> Thanks for the detailed explanation and good job !
> So, in latest upstream version the bug is fixed.What about RHEL -
> should I open a bugzilla ?
>
> Best Regards,
> Strahil Nikolov
The fix is expected to land in RHEL 7.9 and 8.3.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list