[ClusterLabs] 2-node DRBD Pacemaker not performing as expected: Where to next?

Thu Aug 15 17:27:06 EDT 2019

On Thu, 2019-08-15 at 11:25 -0400, Nickle, Richard wrote:
> 
> My objective is two-node active/passive DRBD device which would
> automatically fail over, a secondary objective would be to use
> standard, stock and supported software distributions and repositories
> with as little customization as possible.
> 
> I'm using Ubuntu 18.04.3, plus the DRBD, corosync and Pacemaker that
> are in the (LTS) repositories.  DRBD drbdadm reports version 8.9.10. 
> Corosync is 2.4.3, and Pacemaker is 0.9.164.
> 
> For my test scenario, I would have two nodes up and running, I would
> reboot, disconnect or shut down one node, and the other node would
> then after a delay take over.  That's the scenario I wanted to
> cover:  unexpected loss of a node.  The application is supplementary
> and isn't life safety or mission critical, but it would be measured,
> and the goal would be to stay above 4 nines of uptime annually.
> 
> All of this is working for me, I can manually failover by telling PCS
> to move my resource from one node to another.  If I reboot the
> primary node, the failover will not complete until the primary is
> back online.  Occasionally I'd get split-brain by doing these hard
> kills, which would require manual recovery.
> 
> I added STONITH and watchdog using SBD with an iSCSI block device and
> softdog.  

So far, so good ... except for softdog. Since it's a kernel module, if
something goes wrong at the kernel level, it might fail to execute, and
you might still get split-brain (though much less likely than without
fencing at all). A hardware watchdog or external power fencing is much
more reliable, and if you're looking for 4 9s, it's worth it.

> I added a qdevice to get an odd-numbered quorum.
> 
> When I run crm_simulate on this, the simulation says that if I down
> the primary node, it will promote the resource to the secondary.
> 
> And yet I still see the same behavior:  crashing the primary, there
> is no promotion until after the primary returns online, and after
> that the secondary is smoothly promoted and the primary demoted.
> 
> Getting each component of this stack configured and running has had
> substantial challenges, with regards to compatibility, documentation,
> integration bugs, etc.
> 
> I see other people reporting problems similar to mine, I'm wondering
> if there's a general approach, or perhaps I need a nudge in a new
> direction to tackle this issue?
> 
> * Should I continue to focus on the existing Pacemaker
> configuration?  perhaps there's some hidden or absent
> order/constraint/weighting that is causing this behavior?

It's hard to say without configuration and logs. I'd start by checking
the logs to see whether fencing succeeded when the node was killed. If
fencing fails, pacemaker can't recover anything from the dead node.

> * Should I dig harder at the DRBD configuration?  Is it something
> about the fencing scripts?

It is a good idea to tie DRBD's fencing scripts to pacemaker. The
LINBIT DRBD docs are the best info for that, where it mentions setting
fence-peer to a crm-fence-peer script.

> * Should I try stripping this back down to something more basic?  Can
> I have a reliable failover without STONITH, SBD and an odd-numbered
> quorum?

There's nothing wrong with having both SBD with shared disk, and
qdevice, but you don't need both. If you run qdevice, SBD can get the
quorum information from pacemaker, so it doesn't require the shared
disk.

> * It seems possible that moving to DRBD 9.X might take some of the
> problem off of Pacemaker altogether since it has built in failover
> apparently, is that an easier win?
> * Should I go to another stack?  I'm trying to work within LTS
> releases for stability, but perhaps I would get better integrations
> with RHEL 7, CentOS 7, an edge release of Ubuntu, or some other
> distribution?

There are advantages and disadvantages to changing either of the above,
but I doubt any choice will be easier, just a different set of
roadblocks to work through.

> Thank you for your consideration!
> 
-- 
Ken Gaillot <kgaillot at redhat.com>