[ClusterLabs] pengine bug? Recovery after monitor failure: Restart of DRBD does not restart Filesystem -- unless explicit order start before promote on DRBD

Fri Jan 19 17:52:40 EST 2018

On Thu, 2018-01-11 at 23:15 +0100, Lars Ellenberg wrote:
> To understand some weird behavior we observed,
> I dumbed down a production config to three dummy resources,
> while keeping some descriptive resource ids (ip, drbd, fs).
> 
> For some reason, the constraints are:
> stuff, more stuff, IP -> DRBD -> FS -> other stuff.
> (In the actual real-world config, it makes somewhat more sense,
> but it reproduces with just these three resources)
> 
> All is running just fine.
> 
>     Online: [ ava emma ]
>      virtual_ip     (ocf::pacemaker:Dummy): Started ava
>      Master/Slave Set: ms_drbd_r0 [p_drbd_r0]
> 	 Masters: [ ava ]
>      p_fs_drbd1     (ocf::pacemaker:Dummy): Started ava
> 
> If I simulate a monitor failure on IP:
>     # crm_simulate -L -i virtual_ip_monitor_30000 at ava=1
> 
>     Transition Summary:
>      * Recover virtual_ip   (Started ava)
>      * Restart p_drbd_r0:0  (Master ava)
> 
> Which in real life will obviously fail,
> because we cannot "restart" (demote) a DRBD
> while it is still in use (mounted, in this case).
> 
> Only if I add a stupid intra-resource order constraint that
> explicitly
> states to first start, then promote on the DRBD itself,
> I get the result I would have expected:
> 
>     Transition Summary:
>      * Recover virtual_ip   (Started ava)
>      * Restart p_drbd_r0:0  (Master ava)
>      * Restart p_fs_drbd1   (Started ava)
> 
> Interestingly enough, if I simulate a monitor failure on "DRBD"
> directly,
> it is in both cases the expected:
> 
>     Transition Summary:
>      * Recover p_drbd_r0:0  (Master ava)
>      * Restart p_fs_drbd1   (Started ava)
> 
> 
> What am I missing?
> 
> Do we have to "annotate" somewhere that you must not demote something
> if it is still "in use" by something else?
> 
> Did I just screw up the constraints somehow?
> How would the constraints need to look like to get the expected
> result,
> without explicitly adding the first-start-then-promote constraint?

Your constraints are:

  place IP then place drbd instance(s) with it
  start IP then start drbd instance(s)

  place drbd master then place fs with it
  promote drbd master then start fs

I'm guessing you meant to colocate the drbd *master* with the IP, and
"start IP then promote drbd" -- otherwise you can never have more than
one drbd instance. That doesn't have any relevance to the problem,
though.

I also see you have clone-max="1". Interestingly, if we set this to
"2", it now restarts the fs, but it only promotes drbd (which is
already master).

> Is (was?) this a pengine bug?

Definitely. :-(

I confirmed the behavior on Pacemaker 1.1.12 as well, so it's not
something new. This will require further investigation.

> How to reproduce:
> =================
> 
> crm shell style dummy config:
>     ------------------------------
>     node 1: ava
>     node 2: emma
>     primitive p_drbd_r0 ocf:pacemaker:Stateful \
> 	    op monitor interval=29s role=Master \
> 	    op monitor interval=31s role=Slave
>     primitive p_fs_drbd1 ocf:pacemaker:Dummy \
> 	    op monitor interval=20 timeout=40
>     primitive virtual_ip ocf:pacemaker:Dummy \
> 	    op monitor interval=30s
>     ms ms_drbd_r0 p_drbd_r0 \
> 	    meta master-max=1 master-node-max=1 clone-max=1 clone-node-
> max=1
>     colocation c1 inf: ms_drbd_r0 virtual_ip
>     colocation c2 inf: p_fs_drbd1:Started ms_drbd_r0:Master
>     order o1 inf: virtual_ip:start ms_drbd_r0:start
>     order o2 inf: ms_drbd_r0:promote p_fs_drbd1:start
>     ------------------------------
> 
> crm_simulate -x bad.xml -i virtual_ip_monitor_30000 at ava=1
> 
>  trying to demote DRBD before umount :-((
> 
> adding stupid constraint:
> 
>     order first-start-then-promote inf: ms_drbd_r0:start
> ms_drbd_r0:promote
> 
> crm_simulate -x good.xml -i virtual_ip_monitor_30000 at ava=1
> 
>   yay, first umount, then demote...
> 
> (tested with 1.1.15 and 1.1.16, not yet with more recent code base)
> 
> 
> Full good.xml and bad.xml are both attached.
> 
> Manipulating constraint in live cib using cibadmin only:
> add: cibadmin -C -o constraints -X '<rsc_order id="first-start-then-
> promote" score="INFINITY" first="ms_drbd_r0" first-action="start"
> then="ms_drbd_r0" then-action="promote"/>'
> del: cibadmin -D -X '<rsc_order id="first-start-then-promote"/>'
> 
> Thanks,
> 
>     Lars
-- 
Ken Gaillot <kgaillot at redhat.com>