[Pacemaker] drbd on heartbeat links
Pavlos Parissis
pavlos.parissis at gmail.com
Tue Nov 2 17:07:17 EDT 2010
On 2 November 2010 16:15, Dan Frincu <dfrincu at streamwide.ro> wrote:
> Hi,
>
> Pavlos Parissis wrote:
>>
>> Hi,
>>
>> I am trying to figure out how I can resolve the following scenario
>>
>> Facts
>> 3 nodes
>> 2 DRBD ms resource
>> 2 group resource
>> by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2
>> drbd1/group1 can only run on node-01 and node-03
>> drbd2/group2 can only run on node-02 and node-03
>> DRBD fencing_policy is resource-only [1]
>> 2 heartbeat links and one of them used by DRBD communication
>>
>> Scenario
>> 1) node-01 loses both heartbeat links
>> 2) DRBD monitor detects first the absence of the drbd communication
>> and does resource fencing by add location constraint which prevent
>> drbd1 to run on node3
>> 3) pacemaker fencing kicks in and kills node-01
>>
>> due to location constraint created at step 2, drbd1/group1 can run in
>> the cluster
>>
>>
>
> I don't understand exactly what you mean by this. Resource-only fencing
> would create a -inf score on node1 when the node loses the drbd
> communication channel (the only one drbd uses),
Because node-01 is the primary at the moment of the failure,
resource-fencing will create an -inf score for the node-03.
> however you could still have
> heartbeat communication available via the secondary link, then you shouldn't
As I wrote none of the heartbeat links is available.
After I sent the mail, I realized that the node-03 will not see
location constraint created by node-01 because there no heartbeat
communication!
Thus I think my scenario has a flaw, since none of the heartbeat links
are available on node-01.
Resource-fencing from DRBD will be triggered but without any effect
and node-03 or node-02 will fence node-01, and node-03 will be become
the primary for drbd1
> fence the entire node, the resource-only fencing does that for you, the only
> thing you need to do is to add the drbd fence handlers in /etc/drbd.conf.
> handlers {
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> }
>
> Is this what you meant?
No.
Dan thanks for your mail.
Since there is a flaw on the scenario let's define a similar scenario.
status
node-01 primary for drbd1 and group1 runs on it
node-02 primary for drbd2 and group2 runs on it
node-3 secondary for drbd1 and drbd2
2 heartbeat links, and one of them being used for DRBD communication
here is the scenario
1) on node-01 heartbeat link which carries also DRBD communication is lost
2) node-01 does resource-fencing and places score -inf for drbd1 on node-03
3) on node-01 second heartbeat link is lost
4) node-01 will be fenced by one other cluster members
5) drbd1 can't run on node-03 due to location constraint created at step 2
The problem here is that location constraint will be active even
node-01 is fenced.
Any ideas?
Pavlos
drbd.conf
global {
usage-count yes;
}
common {
protocol C;
syncer {
csums-alg sha1;
verify-alg sha1;
rate 10M;
}
net {
data-integrity-alg sha1;
max-buffers 20480;
max-epoch-size 16384;
}
disk {
on-io-error detach;
### Only when DRBD is under cluster ###
fencing resource-only;
### --- ###
}
startup {
wfc-timeout 60;
degr-wfc-timeout 30;
outdated-wfc-timeout 15;
}
### Only when DRBD is under cluster ###
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
### --- ###
}
resource drbd_resource_01 {
on node-01 {
device /dev/drbd1;
disk /dev/sdb1;
address 10.10.10.129:7789;
meta-disk internal;
}
on node-03 {
device /dev/drbd1;
disk /dev/sdb1;
address 10.10.10.131:7789;
meta-disk internal;
}
syncer {
cpu-mask 2;
}
}
resource drbd_resource_02 {
on node-02 {
device /dev/drbd2;
disk /dev/sdb1;
address 10.10.10.130:7790;
meta-disk internal;
}
on node-03 {
device /dev/drbd2;
disk /dev/sdc1;
address 10.10.10.131:7790;
meta-disk internal;
}
syncer {
cpu-mask 1;
}
}
More information about the Pacemaker
mailing list