[Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)

Mon Apr 4 16:55:00 EDT 2011

On Mon, Apr 04, 2011 at 01:34:48PM -0600, Mike Reid wrote:
> All,
> 
> I am running a two-node web cluster on OCFS2 (v1.5.0) via DRBD
> Primary/Primary (v8.3.8) and Pacemaker. Everything  seems to be working

If you want to stay with 8.3.8, make sure you are using 8.3.8.1 (note
the trailing .1), or you can run into stalled resyncs.
Or upgrade to "most recent".

> great, except during testing of hard-boot scenarios.
> 
> Whenever I hard-boot one of the nodes, the other node is successfully fenced
> and marked ³Outdated²
> 
> * <resource minor="0" cs="WFConnection" ro1="Primary"
> ro2="Unknown"ds1="UpToDate" ds2="Outdated" />

 Why keep people using this pseudo xml output?
 where does that come from? we should un-document this.
 This is to be consumed by other programs (like the LINBIT DRBD-MC).
 This is not to be consumed by humans.

Anyways: DRBD is just fine.
The _other_ node was successfully fenced, which enabled _this_ node
(which is still UpToDate, as can be seen above)
to mark the _other_ node as Outdated.

> However, this locks up I/O on the still active node and prevents any
> operations within the cluster :( I have even forced DRBD into StandAlone
> mode while in this state, but that does not resolve the I/O lock
> either....does anyone know if this is possible using OCFS2 (maintaining an
> active cluster in Primary/Unknown once the other node has a failure? E.g. Be
> it forced, controlled, etc)
> 
> I have been focusing on DRBD config, but I am starting to wonder if perhaps
> it¹s something with my Pacemaker or OCFS2 setup that is forcing this I/O
> lock during a failure.  Any thoughts?

No, I don't think that DRBD is blocking IO here (still).
You should really look elsewhere as well.

> > version: 8.3.8 (api:88/proto:86-94)
> > 0:repdata  Connected  Primary/Primary  UpToDate/UpToDate  C  /data    ocfs2

Just do "cat /proc/drbd", please.

> -----------------------------
> DRBD Conf:
> > 
> > global {
> >   usage-count no;

Why not yes?

 ;-)

> > }
> > common {
> >   syncer { rate 10M; }
> > }
> > resource repdata {
> >   protocol C;
> > 
> >   meta-disk internal;
> >   device /dev/drbd0;
> >   disk /dev/sda3;
> > 
> >   handlers {

Do you just copy paste this from somewhere, or do you actually
understand for which scenarios you want a hard poweroff here, and why?

> >     pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
> >     local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
> >     split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> >     fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> >     after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> >   }
> >   startup {
> >     degr-wfc-timeout 120;       # 120 = 2 minutes.
> >     wfc-timeout 30;
> >     become-primary-on both;
> >   }
> >   disk {
> >     fencing resource-only;

No. DRBD definetely is NOT blocking any IO.
It would do so while the fence-peer handler runs,
IFFF you had fencing resource-and-stonith.
(What you probably should have in this scenario. But still...)

> >   }
> >   syncer {
> >     rate 10M;
> >     al-extents 257;
> >   }
> >   net {
> >     cram-hmac-alg "sha1";
> >     shared-secret "XXXXXXX";
> >     allow-two-primaries;
> >     after-sb-0pri discard-zero-changes;
> >     after-sb-1pri discard-secondary;

You do realize that just because something is in Secondary
*at the moment of the next DRBD handshake*
does not mean it has bad data, or less changes?

You do configure automatic data loss here.
Just make sure you realize this,
and you actually want that to happen,
always.

> >     after-sb-2pri disconnect;
> >   }
> >   on ubu10a {
> >     address 192.168.0.66:7788;
> >   }
> >   on ubu10b {
> >     address 192.168.0.67:7788;
> >   }
> > }

I did not look at the cib below.

> -----------------------------
> CIB.xml
> > 
> > node ubu10a \
> >         attributes standby="off"
> > node ubu10b \
> >         attributes standby="off"
> > primitive resDLM ocf:pacemaker:controld \
> >         op monitor interval="120s"
> > primitive resDRBD ocf:linbit:drbd \
> >         params drbd_resource="repdata" \
> >         operations $id="resDRBD-operations" \
> >         op monitor interval="20s" role="Master" timeout="120s" \
> >         op monitor interval="30s" role="Slave" timeout="120s"
> > primitive resFS ocf:heartbeat:Filesystem \
> >         params device="/dev/drbd/by-res/repdata" directory="/data"
> > fstype="ocfs2" \
> >         op monitor interval="120s"
> > primitive resO2CB ocf:pacemaker:o2cb \
> >         op monitor interval="120s"
> > ms msDRBD resDRBD \
> >         meta resource-stickines="100" notify="true" master-max="2"
> > interleave="true"
> > clone cloneDLM resDLM \
> >         meta globally-unique="false" interleave="true"
> > clone cloneFS resFS \
> >         meta interleave="true" ordered="true"
> > clone cloneO2CB resO2CB \
> >         meta globally-unique="false" interleave="true"
> > colocation colDLMDRBD inf: cloneDLM msDRBD:Master
> > colocation colFSO2CB inf: cloneFS cloneO2CB
> > colocation colO2CBDLM inf: cloneO2CB cloneDLM
> > order ordDLMO2CB 0: cloneDLM cloneO2CB
> > order ordDRBDDLM 0: msDRBD:promote cloneDLM
> > order ordO2CBFS 0: cloneO2CB cloneFS
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.0.9-unknown" \
> >         cluster-infrastructure="openais" \
> >         stonith-enabled="false" \
> >         no-quorum-policy="ignore" \
> >         expected-quorum-votes="2"
> > 
> > 

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.