[ClusterLabs] DRBD demote/promote not called - Why? How to fix?

CART Andreas andreas.cart at sonorys.at
Fri Nov 25 18:30:51 CET 2016


Hello again

After performing some million try&error testcases (at least it feels like that) I finally come to the following conclusion:
I necessarily need to understand the scoring and allocation algorithm much better!
Unfortunately I could not find any good documentation  - just some articles containing hints, but that proved insufficient.
Any pointers are very welcome.

So first my original request contained a wrong colocation constraint (missing master role) as Ken pointed out correctly.
This would have been fixed successfully with the correct constraint.

But in the meantime I had advanced to a more complex cluster configuration which exhibited another problem - although it looked at first very much alike.

The new cluster config was for a system like that:

         node1:               node2:
    ---------------       ---------------
   |  FS_at_NFS    |     | FS_avoid_NFS  |
    ---------------       ---------------
   |  NFS_server   |
    ---------------
   |  NFS_ip       |
    ---------------
   |  DRBD_fs      |
    ---------------       ---------------
   |  DRBD-master  |     |  DRBD-slave   |
    ---------------       ---------------

Again I tried to move the NFS server to the other node but DRBD-master was not promoted to the other node.

As I now know there are 2 conditions necessary to see that problem:
  * the -INFINITY colocation constraint for the FS_avoid_NFS
  * a rather high value for resource stickiness (which was originally set to INFINITY)

If I remove the FS_avoid_NFS resource the problem can no longer be reproduced.
As well if I remove the resource stickiness completely (or set it to a small value) the problem is gone.

The maximum value I can use for resource stickiness without a problem is the score which the DRBD-slave returns minus "2".
(At least that's what my experiments seem to show.)

So apparently there is/was a configuration problem with the INFINITY score for stickiness and the -INFINITY score for the colocation constraint.

Moreover I noticed that with my latter configuration there were some pending transitions (which were automatically resolved after the cluster-recheck-interval, which resulted in my first work-around to reduce this interval significantly and thus resolve the problem more quickly).
These pending transitions can only be seen with my order constraints. If I remove these then there are no pending transitions (although the demote does not happen, with the stickiness set too high).
So maybe this could be a bug ... but I will no further investigate on this.

------------------------------------------------------------------------------------------------

And for all those curious out there here are my scores and what I guess how they are calculated.

All primitive resources are colocated with DRBD-master (not one on top of the other).
My DRBD-RA returns 10000 for master and 1000 for slave.
I have set resource stickiness to 995.

Here is the idle state before movement:
[root at deneb682 ~]# crm_simulate -LVs

Current cluster status:
Online: [ deneb682 deneb683 ]

 Master/Slave Set: DRBD-master [DRBD]
     Masters: [ deneb682 ]
     Slaves: [ deneb683 ]
 DRBD_fs        (ocf::heartbeat:Dummy): Started deneb682
 NFS_ip (ocf::heartbeat:Dummy): Started deneb682
 NFS_server     (ocf::heartbeat:Dummy): Started deneb682
 FS_at_server   (ocf::heartbeat:Dummy): Started deneb682
 FS_avoid_server        (ocf::heartbeat:Dummy): Started deneb683

Allocation scores:
clone_color: DRBD-master allocation score on deneb682: 3980     <= 4 * stickiness (once for each resource on top in the chain)
clone_color: DRBD-master allocation score on deneb683: 0
clone_color: DRBD:0 allocation score on deneb682: 10001         <= master + 1 for 'started on this node'
clone_color: DRBD:0 allocation score on deneb683: 0
clone_color: DRBD:1 allocation score on deneb682: 0
clone_color: DRBD:1 allocation score on deneb683: 1001          <= slave + 1 for 'started on this node'
native_color: DRBD:0 allocation score on deneb682: 10001        <= same as clone_color above
native_color: DRBD:0 allocation score on deneb683: -995         <= same as clone_color above - stickiness
native_color: DRBD:1 allocation score on deneb682: -INFINITY    <= due to DRBD:0 already allocated for this node
native_color: DRBD:1 allocation score on deneb683: 6            <= same as clone_color above - stickiness
DRBD:0 promotion score on deneb682: 17960                       <= master + 2 * 4 * stickiness
DRBD:1 promotion score on deneb683: 5                           <= slave - stickiness
native_color: DRBD_fs allocation score on deneb682: 10996       <= inherit from DRBD:0 (master) + stickiness
native_color: DRBD_fs allocation score on deneb683: -INFINITY   <= due to no DRBD master allocated for this node
native_color: NFS_ip allocation score on deneb682: 10996
native_color: NFS_ip allocation score on deneb683: -INFINITY
native_color: NFS_server allocation score on deneb682: 10996
native_color: NFS_server allocation score on deneb683: -INFINITY
native_color: FS_at_server allocation score on deneb682: 10996
native_color: FS_at_server allocation score on deneb683: -INFINITY
native_color: FS_avoid_server allocation score on deneb682: -INFINITY  <= due to DRBD master allocated for this node
native_color: FS_avoid_server allocation score on deneb683: 995        <= stickiness


And here is the successful transition following the 'pcs resource move NFS_server':
Allocation scores:
clone_color: DRBD-master allocation score on deneb682: 1        <= stickiness no longer considered; just 1 for 'master on this node'
clone_color: DRBD-master allocation score on deneb683: 0
clone_color: DRBD:0 allocation score on deneb682: 10001         <= no change
clone_color: DRBD:0 allocation score on deneb683: 0
clone_color: DRBD:1 allocation score on deneb682: 0
clone_color: DRBD:1 allocation score on deneb683: 1001          <= no change
native_color: DRBD:0 allocation score on deneb682: 10001        <= no change
native_color: DRBD:0 allocation score on deneb683: -995         <= no change
native_color: DRBD:1 allocation score on deneb682: -INFINITY    <= no change
native_color: DRBD:1 allocation score on deneb683: 6            <= no change
DRBD:1 promotion score on deneb683: 5                           <= no change
DRBD:0 promotion score on deneb682: 1                           <= stickiness no longer considered; just 1 for 'master on this node'
native_color: DRBD_fs allocation score on deneb682: -INFINITY
native_color: DRBD_fs allocation score on deneb683: 6
native_color: NFS_ip allocation score on deneb682: -INFINITY
native_color: NFS_ip allocation score on deneb683: 6
native_color: NFS_server allocation score on deneb682: -INFINITY   <= due to move constraint
native_color: NFS_server allocation score on deneb683: 6           <= inherit from DRBD:1 (slave) - stickiness
native_color: FS_at_server allocation score on deneb682: -INFINITY
native_color: FS_at_server allocation score on deneb683: 6
native_color: FS_avoid_server allocation score on deneb682: 0      <= stickiness no longer considered
native_color: FS_avoid_server allocation score on deneb683: -INFINITY


In contrast here the same transition, which does not result in a demote with resource stickiness set to 2002:
Allocation scores:
clone_color: DRBD-master allocation score on deneb682: 1        <= similar to above
clone_color: DRBD-master allocation score on deneb683: 0
clone_color: DRBD:0 allocation score on deneb682: 10001         <= similar to above
clone_color: DRBD:0 allocation score on deneb683: 0
clone_color: DRBD:1 allocation score on deneb682: 0
clone_color: DRBD:1 allocation score on deneb683: 1001          <= similar to above
native_color: DRBD:0 allocation score on deneb682: 10001        <= similar to above
native_color: DRBD:0 allocation score on deneb683: -2002        <= similar to above
native_color: DRBD:1 allocation score on deneb682: -INFINITY    <= similar to above
native_color: DRBD:1 allocation score on deneb683: 1001         <= similar to above but with changed sign!?
DRBD:0 promotion score on deneb682: 1                           <= similar to above
DRBD:1 promotion score on deneb683: 1                           <= I have no clue where this could come from!?
native_color: DRBD_fs allocation score on deneb682: 12003       <= inherit from DRBD:0 (master) + stickiness
native_color: DRBD_fs allocation score on deneb683: -INFINITY
native_color: NFS_ip allocation score on deneb682: 12003
native_color: NFS_ip allocation score on deneb683: -INFINITY
native_color: NFS_server allocation score on deneb682: -INFINITY   <= due to move constraint
native_color: NFS_server allocation score on deneb683: -INFINITY   <= due to no DRBD master allocated for this node
native_color: FS_at_server allocation score on deneb682: 12003
native_color: FS_at_server allocation score on deneb683: -INFINITY
native_color: FS_avoid_server allocation score on deneb682: -INFINITY
native_color: FS_avoid_server allocation score on deneb683: 2002   <= stickiness

>From these I can guess how (most of) the scores are calculated in this situation.
But unfortunately that does only little help to understand scoring and allocation in advance.
(It's always much easier to devise an explanation afterwards, but sometimes you should know in advance.)

Kind regards
Andi


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20161125/4dbde65e/attachment-0001.html>


More information about the Users mailing list