[ClusterLabs] 2-node DRBD Pacemaker not performing as expected: Where to next?

Tue Aug 20 11:08:32 EDT 2019

Thank you Ken, this is very helpful, I was hoping for this kind of
feedback, a chance to step back and rethink.

I didn't realize that SBD could get the quorum information from Pacemaker
for instance.

I don't know how I can get around 'softdog' since I am running entirely in
Hyper-V.

I think one of my questions is answered by your observation that I should
'tie my DRBD fencing scripts into pacemaker.'

I am configured with the DRBD fencing scripts, but I'm running DRBD 8.X.
DRBD9 explicitly announces that 9 supports 'fencing in pacemaker', which
makes me think that 8.x might not.

I am ready to roll onto DRBD 9 but I need to suspend for a short time
because I'm missing my production window.

I don't want to dump my logs and configs here without more insight because
I'd be cluttering up the thread a lot.

Thanks,

Rick

On Thu, Aug 15, 2019 at 5:27 PM Ken Gaillot <kgaillot at redhat.com> wrote:

> On Thu, 2019-08-15 at 11:25 -0400, Nickle, Richard wrote:
> >
> > My objective is two-node active/passive DRBD device which would
> > automatically fail over, a secondary objective would be to use
> > standard, stock and supported software distributions and repositories
> > with as little customization as possible.
> >
> > I'm using Ubuntu 18.04.3, plus the DRBD, corosync and Pacemaker that
> > are in the (LTS) repositories.  DRBD drbdadm reports version 8.9.10.
> > Corosync is 2.4.3, and Pacemaker is 0.9.164.
> >
> > For my test scenario, I would have two nodes up and running, I would
> > reboot, disconnect or shut down one node, and the other node would
> > then after a delay take over.  That's the scenario I wanted to
> > cover:  unexpected loss of a node.  The application is supplementary
> > and isn't life safety or mission critical, but it would be measured,
> > and the goal would be to stay above 4 nines of uptime annually.
> >
> > All of this is working for me, I can manually failover by telling PCS
> > to move my resource from one node to another.  If I reboot the
> > primary node, the failover will not complete until the primary is
> > back online.  Occasionally I'd get split-brain by doing these hard
> > kills, which would require manual recovery.
> >
> > I added STONITH and watchdog using SBD with an iSCSI block device and
> > softdog.
>
> So far, so good ... except for softdog. Since it's a kernel module, if
> something goes wrong at the kernel level, it might fail to execute, and
> you might still get split-brain (though much less likely than without
> fencing at all). A hardware watchdog or external power fencing is much
> more reliable, and if you're looking for 4 9s, it's worth it.
>
> > I added a qdevice to get an odd-numbered quorum.
> >
> > When I run crm_simulate on this, the simulation says that if I down
> > the primary node, it will promote the resource to the secondary.
> >
> > And yet I still see the same behavior:  crashing the primary, there
> > is no promotion until after the primary returns online, and after
> > that the secondary is smoothly promoted and the primary demoted.
> >
> > Getting each component of this stack configured and running has had
> > substantial challenges, with regards to compatibility, documentation,
> > integration bugs, etc.
> >
> > I see other people reporting problems similar to mine, I'm wondering
> > if there's a general approach, or perhaps I need a nudge in a new
> > direction to tackle this issue?
> >
> > * Should I continue to focus on the existing Pacemaker
> > configuration?  perhaps there's some hidden or absent
> > order/constraint/weighting that is causing this behavior?
>
> It's hard to say without configuration and logs. I'd start by checking
> the logs to see whether fencing succeeded when the node was killed. If
> fencing fails, pacemaker can't recover anything from the dead node.
>
> > * Should I dig harder at the DRBD configuration?  Is it something
> > about the fencing scripts?
>
> It is a good idea to tie DRBD's fencing scripts to pacemaker. The
> LINBIT DRBD docs are the best info for that, where it mentions setting
> fence-peer to a crm-fence-peer script.
>
> > * Should I try stripping this back down to something more basic?  Can
> > I have a reliable failover without STONITH, SBD and an odd-numbered
> > quorum?
>
> There's nothing wrong with having both SBD with shared disk, and
> qdevice, but you don't need both. If you run qdevice, SBD can get the
> quorum information from pacemaker, so it doesn't require the shared
> disk.
>
> > * It seems possible that moving to DRBD 9.X might take some of the
> > problem off of Pacemaker altogether since it has built in failover
> > apparently, is that an easier win?
> > * Should I go to another stack?  I'm trying to work within LTS
> > releases for stability, but perhaps I would get better integrations
> > with RHEL 7, CentOS 7, an edge release of Ubuntu, or some other
> > distribution?
>
> There are advantages and disadvantages to changing either of the above,
> but I doubt any choice will be easier, just a different set of
> roadblocks to work through.
>
> > Thank you for your consideration!
> >
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190820/a9727a41/attachment.html>