[Pacemaker] DRBD Recovery Policies

Fri Mar 12 05:47:38 EST 2010

Sorry, as I thought, I'm being stupid :O

Thanks for the prod.

-----Original Message-----
From: Menno Luiten [mailto:mluiten at artifix.net] 
Sent: 12 March 2010 10:40
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] DRBD Recovery Policies

On 12-03-10 11:26, Darren.Mansell at opengi.co.uk wrote:
> Fairly standard, but I don't really want it to be fenced, as I want to
> keep the data that has been updated on the single remaining nodeB
while
> NodeA was being repaired:

That is exactly what fencing is all about; preventing any node to take 
over the primary/master role with outdated data. So I'm not sure what 
you mean with not wanting it to be fenced.

Anyway, it would be enabled by adding the following lines to your 
drbd.conf (depending on the path of your drbd installation). Try it out 
and see if it fits your needs.

>
> global {
>    dialog-refresh       1;
>    minor-count  5;
> }
> common {
>    syncer { rate 10M; }
> }
> resource cluster_disk {
>    protocol  C;
>    disk {
>       on-io-error       pass_on;
>    }
>    syncer {
>    }
> handlers {

     fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
     after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";

>    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>    }
> net {
>       after-sb-1pri discard-secondary;
>    }
> startup {
>       wait-after-sb;
>   }
>    on cluster1 {
>       device    /dev/drbd0;
>       address   12.0.0.1:7789;
>       meta-disk internal;
>       disk      /dev/sdb1;
>    }
>    on cluster2 {
>       device    /dev/drbd0;
>       address   12.0.0.2:7789;
>       meta-disk internal;
>       disk      /dev/sdb1;
>    }
> }
>
>
>
> -----Original Message-----
> From: Menno Luiten [mailto:mluiten at artifix.net]
> Sent: 12 March 2010 10:05
> To: pacemaker at oss.clusterlabs.org
> Subject: Re: [Pacemaker] DRBD Recovery Policies
>
> Are you absolutely sure you set the resource-fencing parameters
> correctly in your drbd.conf (you can post your drbd.conf if unsure)
and
> reloaded the configuration?
>
> On 12-03-10 10:48, Darren.Mansell at opengi.co.uk wrote:
>> The odd thing is - it didn't. From my test, it failed back,
> re-promoted
>> NodeA to be the DRBD master and failed all grouped resources back
too.
>>
>> Everything was working with the ~7GB of data I had put onto NodeB
> while
>> NodeA was down, now available on NodeA...
>>
>> /proc/drbd on the slave said Secondary/Primary UpToDate/Inconsistent
>> while it was syncing data back - so it was able to mount the
>> inconsistent data on the primary node and access the files that
hadn't
>> yet sync'd over?! I mounted a 4GB ISO that shouldn't have been able
to
>> be there yet and was able to access data inside it..
>>
>> Is my understanding of DRBD limited and it's actually able to provide
>> access to not fully sync'd files over the network link or something?
>>
>> If so - wow.
>>
>> I'm confused ;)
>>
>>
>> -----Original Message-----
>> From: Menno Luiten [mailto:mluiten at artifix.net]
>> Sent: 11 March 2010 19:35
>> To: pacemaker at oss.clusterlabs.org
>> Subject: Re: [Pacemaker] DRBD Recovery Policies
>>
>> Hi Darren,
>>
>> I believe that this is handled by DRBD by fencing the Master/Slave
>> resource during resync using Pacemaker. See
>> http://www.drbd.org/users-guide/s-pacemaker-fencing.html. This would
>> prevent Node A to promote/start services with outdated data
>> (fence-peer), and it would be forced to wait with takeover until the
>> resync is completed (after-resync-target).
>>
>> Regards,
>> Menno
>>
>> Op 11-3-2010 15:52, Darren.Mansell at opengi.co.uk schreef:
>>> I've been reading the DRBD Pacemaker guide on the DRBD.org site and
>> I'm
>>> not sure I can find the answer to my question.
>>>
>>> Imagine a scenario:
>>>
>>> (NodeA
>>>
>>> NodeB
>>>
>>> Order and group:
>>>
>>> M/S DRBD Promote/Demote
>>>
>>> FS Mount
>>>
>>> Other resource that depends on the F/S mount
>>>
>>> DRBD master location score of 100 on NodeA)
>>>
>>> NodeA is down, resources failover to NodeB and everything happily
> runs
>>> for days. When NodeA is brought back online it isn't treated as
>>> split-brain as a normal demote/promote would happen. But the data on
>>> NodeA would be very old and possibly take a long time to sync from
>> NodeB.
>>>
>>> What would happen in this scenario? Would the RA defer the promote
>> until
>>> the sync is completed? Would the inability to promote cause the
>> failback
>>> to not happen and a resource cleanup is required once the sync has
>>> completed?
>>>
>>> I guess this is really down to how advanced the Linbit DRBD RA is?
>>>
>>> Thanks
>>>
>>> Darren
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker