[ClusterLabs] pengine bug? Recovery after monitor failure: Restart of DRBD does not restart Filesystem -- unless explicit order start before promote on DRBD

Mon Jan 22 11:17:08 EST 2018

On Fri, Jan 19, 2018 at 04:52:40PM -0600, Ken Gaillot wrote:

> Your constraints are:
> 
>   place IP then place drbd instance(s) with it
>   start IP then start drbd instance(s)
> 
>   place drbd master then place fs with it
>   promote drbd master then start fs
> 
> I'm guessing you meant to colocate the drbd *master* with the IP, and
> "start IP then promote drbd" -- otherwise you can never have more than
> one drbd instance. That doesn't have any relevance to the problem,
> though.

In the real config, it would be a "stacked" DRBD setup,
which only has one instance (per "site").

> I also see you have clone-max="1". Interestingly, if we set this to
> "2", it now restarts the fs, but it only promotes drbd (which is
> already master).
> 
> > Is (was?) this a pengine bug?
> 
> Definitely. :-(
> 
> I confirmed the behavior on Pacemaker 1.1.12 as well, so it's not
> something new. This will require further investigation.

 :-(

Let me know if I can help somehow.

Workaround available, though it's non-obvious
(add the "stupid" constraint).

I think it also works ok with the "equivalent" resource set contraints.

Cheers,

    Lars