[Pacemaker] DRBD and fencing

Matthew Palmer mpalmer at hezmatt.org
Thu Mar 11 08:35:26 UTC 2010


On Thu, Mar 11, 2010 at 03:34:50PM +0800, Martin Aspeli wrote:
> I was wondering, though, if fencing at the DRBD level would get around  
> the possible problem with a full power outage taking the fencing device  
> down.
>
> In my poor understanding of things, it'd work like this:
>
>  - Pacemaker runs on master and slave
>  - Master loses all power
>  - Pacemaker on slave notices something is wrong, and prepares to start  
> up postgres on slave, which will now also be the one writing to the DRBD  
> disk
>  - Before it can do that, it wants to fence off DRBD
>  - It does that by saying to the local DRBD, "even if the other node  
> tries to send you stuff, ignore it". This would avoid the risk of data  
> corruption on slave. Before master could came back up, it'd need to wipe  
> its local partition and re-sync from slave (which is now the new 
> primary).

The old master shouldn't need to "wipe" anything, as it should have no data
that the new master didn't have at the time of the power failure.

The piece of the puzzle I think you're missing is that DRBD will never be
ready for service on a node unless one of the following conditions is true:

* Both nodes have talked to each other and agreed that they're ready to
  exchange data (either because of a clean start on both sides, because
  you've manually prodded a rebooted node into operation again, or because a
  split-brain handler dealt with any issues); or

* A failed node has been successfully fenced and the cluster manager has
  notified DRBD of this fact.

In the case you suggest, where the whole of node "A" disappears, you may
well have a fencing problem: because node "B" can't positively confirm that
"A" is, in fact, dead (because the DRAC went away too), it may refuse to
confirm the fencing operation (this is why using DRAC/IPMI as a STONITH
device isn't such a win).  On the other hand, the DRAC STONITH handler may
assume that if it can't talk to a DRAC unit, that the machine is fenced (I
don't know which way it goes, I haven't looked).

- Matt




More information about the Pacemaker mailing list