[Pacemaker] DRBD and fencing

Thu Mar 11 02:34:50 EST 2010

Serge Dubrouski wrote:
> On Wed, Mar 10, 2010 at 6:59 PM, Martin Aspeli<optilude+lists at gmail.com>  wrote:
>> Serge Dubrouski wrote:
>>> On Wed, Mar 10, 2010 at 5:30 PM, Martin Aspeli<optilude+lists at gmail.com>
>>>   wrote:
>>>> Martin Aspeli wrote:
>>>>> Hi folks,
>>>>>
>>>>> Let's say have a two-node cluster with DRBD and OCFS2, with a database
>>>>> server that's supposed to be active on one node at a time, using the
>>>>> OCFS2 partition for its data store.
>>>>>
>>>>> If we detect a failure on the active node and fail the database over to
>>>>> the other node, we need to fence off the shared storage in case the
>>>>> active node is still writing to it.
>>>>>
>>>>> Can this be done in such a way that the local DRBD/OCFS2 refuses to
>>>>> accept writes from the now-presumed-dead node? I guess this would be
>>>>> similar to putting an access rule on a SAN to block off the previously
>>>>> active node from attempting to read or write any data.
>>>>>
>>>>> Is this feasible?
>>>> We went off on a side-track, I think, but I'd still like to know the
>>>> answer:
>>>> Can one "fence" at the DRBD level?
>>>>
>>>>   From the thread, it sounds like we'll not use OCFS2 for the Postgres
>>>> data
>>>> store, but would still use DRBD, e.g. with ext4 or whatever. The fencing
>>>> problem would then be equally, if not more, acute.
>>>>
>>>> It's basically between doing something at the DRBD level, if that's
>>>> feasible, or using the DRAC IPMI device on our server to shoot it.
>>> But if you implement fencing on Pacemaker level and include your
>>> DRBD/Filesystem resource into Pacemaker configuration you'll be fine.
>> Sorry, I don't quite understand what you mean.
>>
>> What would "fencing on the Pacemaker level" look like? Certainly, DRBD would
>> be managed by the cluster.
>>
>
> That means that you have to implement STONITH through DRAC or any
> other device that will provide fencing capability. In this case if
> Pacemaker detects a split-brain situation it'll kill one of the nodes.

Right, that makes sense.

I was wondering, though, if fencing at the DRBD level would get around 
the possible problem with a full power outage taking the fencing device 
down.

In my poor understanding of things, it'd work like this:

  - Pacemaker runs on master and slave
  - Master loses all power
  - Pacemaker on slave notices something is wrong, and prepares to start 
up postgres on slave, which will now also be the one writing to the DRBD 
disk
  - Before it can do that, it wants to fence off DRBD
  - It does that by saying to the local DRBD, "even if the other node 
tries to send you stuff, ignore it". This would avoid the risk of data 
corruption on slave. Before master could came back up, it'd need to wipe 
its local partition and re-sync from slave (which is now the new primary).

Martin