[Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain

Thu Jun 12 16:39:56 EDT 2014

On Thu, Jun 12, 2014 at 10:10:55AM +1000, Andrew Beekhof wrote:
> Referring to the king of drbd... 
> Lars, question for you inline.

> > =======================================================================
> > primitive DRBD-ffm ocf:linbit:drbd params drbd_resource=ffm \
> > op start interval=0 timeout=240 \
> > op promote interval=0 timeout=90 \
> > op demote interval=0 timeout=90 \
> > op notify interval=0 timeout=90 \
> > op stop interval=0 timeout=100 \
> > op monitor role=Slave timeout=20 interval=20 \
> > op monitor role=Master timeout=20 interval=10
> > ms ms-DRBD-ffm DRBD-ffm meta master-max=1 master-node-max=1 \
> > clone-max=2 clone-node-max=1 notify=true
> > colocation coloc-ms-DRBD-ffm-follows-ALL-ffm inf: \
> > ms-DRBD-ffm:Master ALL-ffm
> > order ord-ALL-ffm-before-DRBD-ffm inf: ALL-ffm ms-DRBD-ffm:promote
> > location loc-ms-DRBD-ffm-korfwm01 ms-DRBD-ffm -inf: korfwm01
> > location loc-ms-DRBD-ffm-korfwm02 ms-DRBD-ffm -inf: korfwm02
> > =======================================================================
> > 
> > # crm node standby korfwf01 ; sleep 10
> > # crm node online korfwf01 ; sleep 10
> > # crm resource move ALL-ffm korfwf01 ; sleep 10
> > # crm node standby korfwf01 ; sleep 10
> > # crm node online korfwf01 ; sleep 10
> > *bang* split-brain.
> > 
> > This is because with the last command "online korfwf01" pacemaker starts
> > and the immediately promotes ms-DRBD-ffm without giving any time for
> > drbd to sync with the peer.
> 
> Have you seen anything like this before?
> I don't know we have any capacity to delay the promotion in the PE... 
> perhaps the agent needs to delay setting a master score if its out of date?
> or maybe loop in the promote action and set a really long timeout

You want to configure DRBD for fencing resource-and-stonith,
and use the fence-peer handler "crm-fence-peer.sh"
(and the corresponding crm-unfence-peer.sh in the after-resync handler.

Done.

What does that do?

If a fencing policy != dont-care is configured,
DRBD, if gracefully disconnected ("stop"), will "outdate" a secondary.
Outdated secondaries refuse to be promoted.

On non-graceful disconnect, a Primary will freeze IO,
call the fence-peer handler, which places a constraint pinning the
primary role to where it currently is, and on success resume IO.

Also, DRBD will not consider itself as UpToDate immediately after
"start", but as "Consistent" at best, which will use a minimal
master_score (or none at all, see adjust-master-scores).

Due to this constraint, pacemaker will not attempt promotion
on the node that was "fenced" (in this case only fenced from becomming
Primary, no necessarily shot... it really only places a constraint)
until that node is unfenced (the constraint is removed),
which will happen in the after-resync-target handler (crm-unfence-peer.sh).

If you don't like the "freeze IO" part above,
you can use the "resource-only" fencing policy.
The and-stonith part is really only about the freeze-io.
The crm-fence-peer.sh does NOT (usually) trigger stonith itself.
It may wait for a successful stonith though, if it thinks one is pending.

The only reliable (as can be) way to avoid data divergence with DRBD and
pacemaker is to use redundant cluster communications,
use working and tested node level fencing on the pacemaker level,
*and* use fencing resource-and-stonith + crm-fence-peer.sh on the DRBD level.

You may want to use the "adjust-master-score" parameter of the DRBD
resource agent as well, to avoid pacemaker attempting to promote an
"only Consistent" DRBD, which will usually fail anyways.
See description there.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.