[Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

Mon Oct 3 13:08:57 EDT 2011

Hi,

On Thu, Sep 29, 2011 at 10:47:33AM -0400, Nick Khamis wrote:
> Hello Dejan,
> 
> Sorry to hijack, I am also working on the same type of setup as a prototype.
> What is the best way to get stonith included for VM setups? Maybe an
> SSH stonith?

external/libvirt, though somebody said that that won't do for
vmware. external/vcenter for vmware. Or external/vmware, though
people keep complaining that that doesn't work. I haven't used
it myself.

Thanks,

Dejan

> Again, this is just for the prototype.
> 
> Cheers,
> 
> Nick.
> 
> On Thu, Sep 29, 2011 at 9:28 AM, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> > Hi Darren,
> >
> > On Thu, Sep 29, 2011 at 02:15:34PM +0100, Darren.Mansell at opengi.co.uk wrote:
> >> (Originally sent to DRBD-user, reposted here as it may be more relevant)
> >>
> >>
> >>
> >>
> >> Hello all.
> >>
> >>
> >>
> >> I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2
> >> for dual-primary shared FS.
> >>
> >>
> >>
> >> I've followed the instructions on the DRBD applications site and it
> >> works really well.
> >>
> >>
> >>
> >> However, if I 'pull the plug' on a node, the other node continues to
> >> operate the clones, but the filesystem is locked and inaccessible (the
> >> monitor op works for the filesystem, but fails for the OCFS2 resource.)
> >>
> >>
> >>
> >> If I do a reboot one node, there are no problems and I can continue to
> >> access the OCFS2 FS.
> >>
> >>
> >>
> >> After I pull the plug:
> >>
> >>
> >>
> >> Online: [ test-odp-02 ]
> >>
> >> OFFLINE: [ test-odp-01 ]
> >>
> >>
> >>
> >> Resource Group: Load-Balancing
> >>
> >>      Virtual-IP-ODP     (ocf::heartbeat:IPaddr2):       Started
> >> test-odp-02
> >>
> >>      Virtual-IP-ODPWS   (ocf::heartbeat:IPaddr2):       Started
> >> test-odp-02
> >>
> >>      ldirectord (ocf::heartbeat:ldirectord):    Started test-odp-02
> >>
> >> Master/Slave Set: ms_drbd_ocfs2 [p_drbd_ocfs2]
> >>
> >>      Masters: [ test-odp-02 ]
> >>
> >>      Stopped: [ p_drbd_ocfs2:1 ]
> >>
> >> Clone Set: cl-odp [odp]
> >>
> >>      Started: [ test-odp-02 ]
> >>
> >>      Stopped: [ odp:1 ]
> >>
> >> Clone Set: cl-odpws [odpws]
> >>
> >>      Started: [ test-odp-02 ]
> >>
> >>      Stopped: [ odpws:1 ]
> >>
> >> Clone Set: cl_fs_ocfs2 [p_fs_ocfs2]
> >>
> >>      Started: [ test-odp-02 ]
> >>
> >>      Stopped: [ p_fs_ocfs2:1 ]
> >>
> >> Clone Set: cl_ocfs2mgmt [g_ocfs2mgmt]
> >>
> >>      Started: [ test-odp-02 ]
> >>
> >>      Stopped: [ g_ocfs2mgmt:1 ]
> >>
> >>
> >>
> >> Failed actions:
> >>
> >>     p_o2cb:0_monitor_10000 (node=test-odp-02, call=19, rc=-2,
> >> status=Timed Out): unknown
> >>
> >> exec error
> >>
> >>
> >>
> >>
> >>
> >> test-odp-02:~ # mount
> >>
> >> /dev/drbd0 on /opt/odp type ocfs2
> >> (rw,_netdev,noatime,cluster_stack=pcmk)
> >>
> >>
> >>
> >> test-odp-02:~ # ls /opt/odp
> >>
> >> ...just hangs forever...
> >>
> >>
> >>
> >> If I then power test-odp-01 back on, everything fails back fine and the
> >> ls command suddenly completes.
> >>
> >>
> >>
> >> It seems to me that OCFS2 is trying to talk to the node that has
> >> disappeared and doesn't time out. Does anyone have any ideas? (attached
> >> CRM and DRBD configs)
> >
> > With stonith disabled, I doubt that your cluster can behave as
> > it should.
> >
> > Thanks,
> >
> > Dejan
> >
> >>
> >>
> >> Many thanks.
> >>
> >>
> >>
> >> Darren Mansell
> >>
> >>
> >>
> >
> >
> > Content-Description: crm.txt
> >> node test-odp-01
> >> node test-odp-02 \
> >>         attributes standby="off"
> >> primitive Virtual-IP-ODP ocf:heartbeat:IPaddr2 \
> >>         params lvs_support="true" ip="2.21.15.100" cidr_netmask="8" broadcast="2.255.255.255" \
> >>         op monitor interval="1m" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive Virtual-IP-ODPWS ocf:heartbeat:IPaddr2 \
> >>         params lvs_support="true" ip="2.21.15.103" cidr_netmask="8" broadcast="2.255.255.255" \
> >>         op monitor interval="1m" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive ldirectord ocf:heartbeat:ldirectord \
> >>         params configfile="/etc/ha.d/ldirectord.cf" \
> >>         op monitor interval="2m" timeout="20s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive odp lsb:odp \
> >>         op monitor interval="10s" enabled="true" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive odpwebservice lsb:odpws \
> >>         op monitor interval="10s" enabled="true" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive p_controld ocf:pacemaker:controld \
> >>         op monitor interval="10s" enabled="true" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive p_drbd_ocfs2 ocf:linbit:drbd \
> >>         params drbd_resource="r0" \
> >>         op monitor interval="10s" enabled="true" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive p_fs_ocfs2 ocf:heartbeat:Filesystem \
> >>         params device="/dev/drbd/by-res/r0" directory="/opt/odp" fstype="ocfs2" options="rw,noatime" \
> >>         op monitor interval="10s" enabled="true" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> primitive p_o2cb ocf:ocfs2:o2cb \
> >>         op monitor interval="10s" enabled="true" timeout="10s" \
> >>         meta migration-threshold="10" failure-timeout="600"
> >> group Load-Balancing Virtual-IP-ODP Virtual-IP-ODPWS ldirectord
> >> group g_ocfs2mgmt p_controld p_o2cb
> >> ms ms_drbd_ocfs2 p_drbd_ocfs2 \
> >>         meta master-max="2" clone-max="2" notify="true"
> >> clone cl-odp odp
> >> clone cl-odpws odpws
> >> clone cl_fs_ocfs2 p_fs_ocfs2 \
> >>         meta target-role="Started"
> >> clone cl_ocfs2mgmt g_ocfs2mgmt \
> >>         meta interleave="true"
> >> location Prefer-Node1 ldirectord \
> >>         rule $id="prefer-node1-rule" 100: #uname eq test-odp-01
> >> order o_ocfs2 inf: ms_drbd_ocfs2:promote cl_ocfs2mgmt:start cl_fs_ocfs2:start
> >> order tomcatlast1 inf: cl_fs_ocfs2 cl-odp
> >> order tomcatlast2 inf: cl_fs_ocfs2 cl-odpws
> >> property $id="cib-bootstrap-options" \
> >>         dc-version="1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60" \
> >>         cluster-infrastructure="openais" \
> >>         expected-quorum-votes="2" \
> >>         no-quorum-policy="ignore" \
> >>         start-failure-is-fatal="false" \
> >>         stonith-action="reboot" \
> >>         stonith-enabled="false" \
> >>         last-lrm-refresh="1317207361"
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker