[ClusterLabs] pengine bug? Recovery after monitor failure: Restart of DRBD does not restart Filesystem -- unless explicit order start before promote on DRBD

Thu Jan 11 17:15:01 EST 2018

To understand some weird behavior we observed,
I dumbed down a production config to three dummy resources,
while keeping some descriptive resource ids (ip, drbd, fs).

For some reason, the constraints are:
stuff, more stuff, IP -> DRBD -> FS -> other stuff.
(In the actual real-world config, it makes somewhat more sense,
but it reproduces with just these three resources)

All is running just fine.

    Online: [ ava emma ]
     virtual_ip     (ocf::pacemaker:Dummy): Started ava
     Master/Slave Set: ms_drbd_r0 [p_drbd_r0]
	 Masters: [ ava ]
     p_fs_drbd1     (ocf::pacemaker:Dummy): Started ava

If I simulate a monitor failure on IP:
    # crm_simulate -L -i virtual_ip_monitor_30000 at ava=1

    Transition Summary:
     * Recover virtual_ip   (Started ava)
     * Restart p_drbd_r0:0  (Master ava)

Which in real life will obviously fail,
because we cannot "restart" (demote) a DRBD
while it is still in use (mounted, in this case).

Only if I add a stupid intra-resource order constraint that explicitly
states to first start, then promote on the DRBD itself,
I get the result I would have expected:

    Transition Summary:
     * Recover virtual_ip   (Started ava)
     * Restart p_drbd_r0:0  (Master ava)
     * Restart p_fs_drbd1   (Started ava)

Interestingly enough, if I simulate a monitor failure on "DRBD" directly,
it is in both cases the expected:

    Transition Summary:
     * Recover p_drbd_r0:0  (Master ava)
     * Restart p_fs_drbd1   (Started ava)

What am I missing?

Do we have to "annotate" somewhere that you must not demote something
if it is still "in use" by something else?

Did I just screw up the constraints somehow?
How would the constraints need to look like to get the expected result,
without explicitly adding the first-start-then-promote constraint?

Is (was?) this a pengine bug?

How to reproduce:
=================

crm shell style dummy config:
    ------------------------------
    node 1: ava
    node 2: emma
    primitive p_drbd_r0 ocf:pacemaker:Stateful \
	    op monitor interval=29s role=Master \
	    op monitor interval=31s role=Slave
    primitive p_fs_drbd1 ocf:pacemaker:Dummy \
	    op monitor interval=20 timeout=40
    primitive virtual_ip ocf:pacemaker:Dummy \
	    op monitor interval=30s
    ms ms_drbd_r0 p_drbd_r0 \
	    meta master-max=1 master-node-max=1 clone-max=1 clone-node-max=1
    colocation c1 inf: ms_drbd_r0 virtual_ip
    colocation c2 inf: p_fs_drbd1:Started ms_drbd_r0:Master
    order o1 inf: virtual_ip:start ms_drbd_r0:start
    order o2 inf: ms_drbd_r0:promote p_fs_drbd1:start
    ------------------------------

crm_simulate -x bad.xml -i virtual_ip_monitor_30000 at ava=1

 trying to demote DRBD before umount :-((

adding stupid constraint:

    order first-start-then-promote inf: ms_drbd_r0:start ms_drbd_r0:promote

crm_simulate -x good.xml -i virtual_ip_monitor_30000 at ava=1

  yay, first umount, then demote...

(tested with 1.1.15 and 1.1.16, not yet with more recent code base)

Full good.xml and bad.xml are both attached.

Manipulating constraint in live cib using cibadmin only:
add: cibadmin -C -o constraints -X '<rsc_order id="first-start-then-promote" score="INFINITY" first="ms_drbd_r0" first-action="start" then="ms_drbd_r0" then-action="promote"/>'
del: cibadmin -D -X '<rsc_order id="first-start-then-promote"/>'

Thanks,

    Lars

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bad.xml.bz2
Type: application/octet-stream
Size: 1976 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20180111/f2659e13/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: good.xml.bz2
Type: application/octet-stream
Size: 2006 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20180111/f2659e13/attachment-0005.obj>