[Pacemaker] DRBD and fencing

Thu Mar 11 04:26:19 EST 2010

Matthew Palmer wrote:
> On Thu, Mar 11, 2010 at 03:34:50PM +0800, Martin Aspeli wrote:
>> I was wondering, though, if fencing at the DRBD level would get around
>> the possible problem with a full power outage taking the fencing device
>> down.
>>
>> In my poor understanding of things, it'd work like this:
>>
>>   - Pacemaker runs on master and slave
>>   - Master loses all power
>>   - Pacemaker on slave notices something is wrong, and prepares to start
>> up postgres on slave, which will now also be the one writing to the DRBD
>> disk
>>   - Before it can do that, it wants to fence off DRBD
>>   - It does that by saying to the local DRBD, "even if the other node
>> tries to send you stuff, ignore it". This would avoid the risk of data
>> corruption on slave. Before master could came back up, it'd need to wipe
>> its local partition and re-sync from slave (which is now the new
>> primary).
>
> The old master shouldn't need to "wipe" anything, as it should have no data
> that the new master didn't have at the time of the power failure.

I was just thinking that if the failure was, e.g., the connection 
between master and the rest of the cluster, postgres on the old master 
could stay up and merrily keep writing to the filesystem on the DRBD.

In the case of power failure, that wouldn't happen, of course. But in 
case of total power failure, the fencing device (an IPMI device, Dell 
DRAC) would be inaccessible too, so the cluster would not fail postgres 
over.

> The piece of the puzzle I think you're missing is that DRBD will never be
> ready for service on a node unless one of the following conditions is true:
>
> * Both nodes have talked to each other and agreed that they're ready to
>    exchange data (either because of a clean start on both sides, because
>    you've manually prodded a rebooted node into operation again, or because a
>    split-brain handler dealt with any issues); or
>
> * A failed node has been successfully fenced and the cluster manager has
>    notified DRBD of this fact.

Right.

> In the case you suggest, where the whole of node "A" disappears, you may
> well have a fencing problem: because node "B" can't positively confirm that
> "A" is, in fact, dead (because the DRAC went away too), it may refuse to
> confirm the fencing operation (this is why using DRAC/IPMI as a STONITH
> device isn't such a win).

 From what I'm reading, the only fencing device that's truly good is a 
UPS that can cut power to an individual device. Unfortunately, we don't 
have such a device and can't get one. We do have a UPS with a backup 
generator, and dual PSUs, so total power outage is unlikely. But someone 
could also just pull the (two) cables out of the UPS and pacemaker would 
be none the wiser.

> On the other hand, the DRAC STONITH handler may
> assume that if it can't talk to a DRAC unit, that the machine is fenced (I
> don't know which way it goes, I haven't looked).

The docs say it will assume the device is not fenced, and keep trying to 
fence it "forever", hence never actually failing over.

What I don't get is, if this happens, why can't slave just say, "I'm 
going to assume master is gone and take over postgres, and I'm not going 
to let anyone else write anything to my disk". In my mind, this is 
similar to having a shared SAN and having the fencing operation be "node 
master is no longer allowed to mount or write to the SAN disk, even if 
it tries".

Martin

-- 
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book