[Pacemaker] Cluster with DRBD : split brain

Tue Jul 26 11:43:58 EDT 2011

On Wed, Jul 20, 2011 at 11:36:25AM -0400, Digimer wrote:
> On 07/20/2011 11:24 AM, Hugo Deprez wrote:
> > Hello Andrew,
> > 
> > in fact DRBD was in standalone mode but the cluster was working :
> > 
> > Here is the syslog of the drbd's split brain :
> > 
> > Jul 15 08:45:34 node1 kernel: [1536023.052245] block drbd0: Handshake
> > successful: Agreed network protocol version 91
> > Jul 15 08:45:34 node1 kernel: [1536023.052267] block drbd0: conn(
> > WFConnection -> WFReportParams )
> > Jul 15 08:45:34 node1 kernel: [1536023.066677] block drbd0: Starting
> > asender thread (from drbd0_receiver [23281])
> > Jul 15 08:45:34 node1 kernel: [1536023.066863] block drbd0:
> > data-integrity-alg: <not-used>
> > Jul 15 08:45:34 node1 kernel: [1536023.079182] block drbd0:
> > drbd_sync_handshake:
> > Jul 15 08:45:34 node1 kernel: [1536023.079190] block drbd0: self
> > BBA9B794EDB65CDF:9E8FB52F896EF383:C5FE44742558F9E1:1F9E06135B8E296F
> > bits:75338 flags:0
> > Jul 15 08:45:34 node1 kernel: [1536023.079196] block drbd0: peer
> > 8343B5F30B2BF674:9E8FB52F896EF382:C5FE44742558F9E0:1F9E06135B8E296F
> > bits:769 flags:0
> > Jul 15 08:45:34 node1 kernel: [1536023.079200] block drbd0:
> > uuid_compare()=100 by rule 90
> > Jul 15 08:45:34 node1 kernel: [1536023.079203] block drbd0: Split-Brain
> > detected, dropping connection!
> > Jul 15 08:45:34 node1 kernel: [1536023.079439] block drbd0: helper
> > command: /sbin/drbdadm split-brain minor-0
> > Jul 15 08:45:34 node1 kernel: [1536023.083955] block drbd0: meta
> > connection shut down by peer.
> > Jul 15 08:45:34 node1 kernel: [1536023.084163] block drbd0: conn(
> > WFReportParams -> NetworkFailure )
> > Jul 15 08:45:34 node1 kernel: [1536023.084173] block drbd0: asender
> > terminated
> > Jul 15 08:45:34 node1 kernel: [1536023.084176] block drbd0: Terminating
> > asender thread
> > Jul 15 08:45:34 node1 kernel: [1536023.084406] block drbd0: helper
> > command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
> > Jul 15 08:45:34 node1 kernel: [1536023.084420] block drbd0: conn(
> > NetworkFailure -> Disconnecting )
> > Jul 15 08:45:34 node1 kernel: [1536023.084430] block drbd0: error
> > receiving ReportState, l: 4!
> > Jul 15 08:45:34 node1 kernel: [1536023.084789] block drbd0: Connection
> > closed
> > Jul 15 08:45:34 node1 kernel: [1536023.084813] block drbd0: conn(
> > Disconnecting -> StandAlone )
> > Jul 15 08:45:34 node1 kernel: [1536023.086345] block drbd0: receiver
> > terminated
> > Jul 15 08:45:34 node1 kernel: [1536023.086349] block drbd0: Terminating
> > receiver thread
> 
> This was a DRBD split-brain, not a pacemaker split. I think that might
> have been the source of confusion.
> 
> The split brain occurs when both DRBD nodes lose contact with one
> another and then proceed as StandAlone/Primary/UpToDate. To avoid this,
> configure fencing (stonith) in Pacemaker, then use 'crm-fence-peer.sh'
> in drbd.conf;
> 
> ===
>         disk {
>                 fencing         resource-and-stonith;
>         }
> 
>         handlers {
>                 outdate-peer    "/path/to/crm-fence-peer.sh";
>         }
> ===

Thanks, that is basically right.
Let me fill in some details, though:

> This will tell DRBD to block (resource) and fence (stonith). DRBD will

drbd fencing options are "fencing resource-only",
and "fencing resource-and-stonith". 

"resource-only" does *not* block IO while the fencing handler runs.

"resource-and-stonith" does block IO.

> not resume IO until either the fence script exits with a success, or
> until an admit types 'drbdadm resume-io <res>'.

> The CRM script simply calls pacemaker and asks it to fence the other
> node.

No.  It tries to place a constraint forcing the Master role off of any
node but the one with the good data.

> When a node has actually failed, then the lost no is fenced. If
> both nodes are up but disconnected, as you had, then only the fastest
> node will succeed in calling the fence, and the slower node will be
> fenced before it can call a fence.

"fenced" may be "restricted from being/becoming Master" by that fencing
constraint. Or, if pacemaker decided to do so, actually "shot" by some
node level fencing agent (stonith).

All that resource-level fencing by placing some constraint stuff
obviously only works as long as the cluster communication is still up.
It not only the drbd replication link had issues, but the cluster
communication was down as well, it becomes a bit more complex.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.