[Pacemaker] Is IPMI reliable to avoid DRBD SplitBrain?

Xiaomin Zhang zhangxiaomin at gmail.com
Mon Sep 2 12:20:50 EDT 2013


Hi, Digimer:
Below is the output of drbdadm dump:
# /etc/drbd.conf
common {
    protocol               C;
    net {
        after-sb-0pri    discard-zero-changes;
        after-sb-1pri    consensus;
        after-sb-2pri    disconnect;
        cram-hmac-alg    sha512;
        shared-secret    acde;
    }
    disk {
        on-io-error      detach;
        fencing          resource-and-stonith;
    }
    syncer {
        rate             33M;
    }
    startup {
        wfc-timeout      120;
    }
    handlers {
        fence-peer       /usr/lib/drbd/crm-fence-peer.sh;
        after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
    }
}

# resource r0 on suse4: not ignored, not stacked
resource r0 {
    on suse2 {
        device           /dev/drbd0 minor 0;
        disk             /dev/sdc1;
        address          ipv4 XXX:7789;
        meta-disk        internal;
    }
    on suse4 {
        device           /dev/drbd0 minor 0;
        disk             /dev/sdc1;
        address          ipv4 YYY:7789;
        meta-disk        internal;
    }
}
And for crm configure, please find below configuration:
primitive drbd1 ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="15s"
primitive fs1 ocf:heartbeat:Filesystem \
        op monitor interval="15s" \
        params device="/dev/drbd0" directory="/opt/drbd" fstype="ext3" \
        meta target-role="Started"
primitive suse2-stonith stonith:external/ipmi \
        params hostname="suse2" ipaddr="XXX" userid="admin" passwd="xxx"
interface="lan"
primitive suse4-stonith stonith:external/ipmi \
        params hostname="suse4" ipaddr="YYY" userid="admin" passwd="yyy"
interface="lan"
ms ms_drbd1 drbd1 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
location drbd-fence-by-handler-ms_drbd1 ms_drbd1 \
        rule $id="drbd-fence-by-handler-rule-ms_drbd1" $role="Master" -inf:
#uname ne suse4
location st-suse2 suse2-stonith -inf: suse2
location st-suse4 suse4-stonith -inf: suse4
colocation fs_on_drbd inf: fs1 ms_drbd1:Master
        dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        stonith-enabled="true" \
        last-lrm-refresh="1378051434"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"
I think drbd-fence-by-handler-rule-ms_drbd1 rule is generated by
crm-fence-peer.sh. And this keeps existing as the crm-unfence-peer.sh is
never called since last fail over.
What's wrong with my configuration?
Thanks.


On Mon, Sep 2, 2013 at 9:42 PM, Digimer <lists at alteeve.ca> wrote:

> On 02/09/13 08:55, Xiaomin Zhang wrote:
>
>> Hi, guy:
>> I followed the standard way to enable the IPMI based STONITH for a
>> service which relies on DRBD primary-secondary replication.
>> Besides below pacemaker configuration (of cause, STONITH is enabled for
>> pacemaker):
>>
>> primitive suse2-stonith stonith:external/ipmi \
>>          params hostname="suse2" ipaddr="XXX" userid="admin"
>> passwd="xxx" interface="lan"
>> primitive suse4-stonith stonith:external/ipmi \
>>          params hostname="suse4" ipaddr="YYY" userid="admin"
>> passwd="yyy" interface="lan"
>> location st-suse2 suse2-stonith -inf: suse2
>> location st-suse4 suse4-stonith -inf: suse4
>>
>> I also use 'resource-and-stonith' as DRBD global configuration.
>> This configuration works for many times with below failure tests:
>> 1.  iptables -A INPUT -j DROP
>> 2.  echo c > /proc/sysrq-trigger
>> 3.  /etc/init.d/network stop
>> 4.  reboot
>> The failed node will be power cycled the counterpart by IPMI command.
>> However, I still get DRBD SplitBrain issue for some time. Does that mean
>> IPMI is still not so reliable for DATA integration?
>>
>> And I was also so confused that for many times, crm-unfence-peer.sh. is
>> not called after crm-fence-peer.sh. Does this imply that I have
>> something misconfigured?
>> Your advice is really appreciated.
>> Thanks in advance.
>>
>
> I don't think that using the firewall to block traffic is a good way to
> test. That said, if the failure triggers a reboot, then it's working.
>
> Did you setup the fence-handler in DRBD to use 'crm-fence-peer.sh'?
>
> Please share your 'crm configure show' and 'drbdadm dump'.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130903/9933bd50/attachment-0003.html>


More information about the Pacemaker mailing list