[Pacemaker] Cluster with DRBD : split brain

Tue Jul 26 13:27:11 EDT 2011

On 07/26/2011 11:43 AM, Lars Ellenberg wrote:
> On Wed, Jul 20, 2011 at 11:36:25AM -0400, Digimer wrote:
>> On 07/20/2011 11:24 AM, Hugo Deprez wrote:
>>> Hello Andrew,
>>>
>>> in fact DRBD was in standalone mode but the cluster was working :
>>>
>>> Here is the syslog of the drbd's split brain :
>>>
>>> Jul 15 08:45:34 node1 kernel: [1536023.052245] block drbd0: Handshake
>>> successful: Agreed network protocol version 91
>>> Jul 15 08:45:34 node1 kernel: [1536023.052267] block drbd0: conn(
>>> WFConnection -> WFReportParams )
>>> Jul 15 08:45:34 node1 kernel: [1536023.066677] block drbd0: Starting
>>> asender thread (from drbd0_receiver [23281])
>>> Jul 15 08:45:34 node1 kernel: [1536023.066863] block drbd0:
>>> data-integrity-alg: <not-used>
>>> Jul 15 08:45:34 node1 kernel: [1536023.079182] block drbd0:
>>> drbd_sync_handshake:
>>> Jul 15 08:45:34 node1 kernel: [1536023.079190] block drbd0: self
>>> BBA9B794EDB65CDF:9E8FB52F896EF383:C5FE44742558F9E1:1F9E06135B8E296F
>>> bits:75338 flags:0
>>> Jul 15 08:45:34 node1 kernel: [1536023.079196] block drbd0: peer
>>> 8343B5F30B2BF674:9E8FB52F896EF382:C5FE44742558F9E0:1F9E06135B8E296F
>>> bits:769 flags:0
>>> Jul 15 08:45:34 node1 kernel: [1536023.079200] block drbd0:
>>> uuid_compare()=100 by rule 90
>>> Jul 15 08:45:34 node1 kernel: [1536023.079203] block drbd0: Split-Brain
>>> detected, dropping connection!
>>> Jul 15 08:45:34 node1 kernel: [1536023.079439] block drbd0: helper
>>> command: /sbin/drbdadm split-brain minor-0
>>> Jul 15 08:45:34 node1 kernel: [1536023.083955] block drbd0: meta
>>> connection shut down by peer.
>>> Jul 15 08:45:34 node1 kernel: [1536023.084163] block drbd0: conn(
>>> WFReportParams -> NetworkFailure )
>>> Jul 15 08:45:34 node1 kernel: [1536023.084173] block drbd0: asender
>>> terminated
>>> Jul 15 08:45:34 node1 kernel: [1536023.084176] block drbd0: Terminating
>>> asender thread
>>> Jul 15 08:45:34 node1 kernel: [1536023.084406] block drbd0: helper
>>> command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
>>> Jul 15 08:45:34 node1 kernel: [1536023.084420] block drbd0: conn(
>>> NetworkFailure -> Disconnecting )
>>> Jul 15 08:45:34 node1 kernel: [1536023.084430] block drbd0: error
>>> receiving ReportState, l: 4!
>>> Jul 15 08:45:34 node1 kernel: [1536023.084789] block drbd0: Connection
>>> closed
>>> Jul 15 08:45:34 node1 kernel: [1536023.084813] block drbd0: conn(
>>> Disconnecting -> StandAlone )
>>> Jul 15 08:45:34 node1 kernel: [1536023.086345] block drbd0: receiver
>>> terminated
>>> Jul 15 08:45:34 node1 kernel: [1536023.086349] block drbd0: Terminating
>>> receiver thread
>>
>> This was a DRBD split-brain, not a pacemaker split. I think that might
>> have been the source of confusion.
>>
>> The split brain occurs when both DRBD nodes lose contact with one
>> another and then proceed as StandAlone/Primary/UpToDate. To avoid this,
>> configure fencing (stonith) in Pacemaker, then use 'crm-fence-peer.sh'
>> in drbd.conf;
>>
>> ===
>>         disk {
>>                 fencing         resource-and-stonith;
>>         }
>>
>>         handlers {
>>                 outdate-peer    "/path/to/crm-fence-peer.sh";
>>         }
>> ===
> 
> Thanks, that is basically right.
> Let me fill in some details, though:
> 
>> This will tell DRBD to block (resource) and fence (stonith). DRBD will
> 
> drbd fencing options are "fencing resource-only",
> and "fencing resource-and-stonith". 
> 
> "resource-only" does *not* block IO while the fencing handler runs.
> 
> "resource-and-stonith" does block IO.

Ahhh, that's why I was confused. I thought the 'resource' meant the same
thing in both cases, but had only read the 'resource-and-stonith' section.

>> not resume IO until either the fence script exits with a success, or
>> until an admit types 'drbdadm resume-io <res>'.
> 
> 
>> The CRM script simply calls pacemaker and asks it to fence the other
>> node.
> 
> No.  It tries to place a constraint forcing the Master role off of any
> node but the one with the good data.

Ok, I thought it was akin to the 'obliterate-peer.sh' script, which
calls 'fence_node'... I made an assumption, which was not correct.

>> When a node has actually failed, then the lost no is fenced. If
>> both nodes are up but disconnected, as you had, then only the fastest
>> node will succeed in calling the fence, and the slower node will be
>> fenced before it can call a fence.
> 
> "fenced" may be "restricted from being/becoming Master" by that fencing
> constraint. Or, if pacemaker decided to do so, actually "shot" by some
> node level fencing agent (stonith).
> 
> All that resource-level fencing by placing some constraint stuff
> obviously only works as long as the cluster communication is still up.
> It not only the drbd replication link had issues, but the cluster
> communication was down as well, it becomes a bit more complex.

Thanks for the clarity. Today I learned. :)

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"At what point did we forget that the Space Shuttle was, essentially,
a program that strapped human beings to an explosion and tried to stab
through the sky with fire and math?"