[Pacemaker] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

Mike Reid mbreid at thepei.com
Thu Sep 15 16:24:07 EDT 2011


Hello all,

We have a two-node cluster still in development that has been running fine
for weeks (little to no traffic). I made some updates to our CIB recently,
and everything seemed just fine.

Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it
was complete one of the nodes had become completely disconnected and I
haven't been able to reconnect since.

DRBD is working fine, everything is UpToDate and I can get both nodes in
Primary/Primary, but when it comes down to starting OCFS2 and mounting the
volume, I'm left with:

> resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error

I am using "pcmk" as the cluster_stack, and letting Pacemaker control
everything...

The last time this happened the only way I was able to resolve it was to
reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do
this, underlying blocks seem fine, and one of the nodes is running just
fine. The (currently) unmounted node is staying in sync as far as DRBD is
concerned.

Here's some detail that hopefully will help, please let me know if there's
anything else I can provide to help know the best way to get this node back
"online":


Ubuntu 10.10 / Kernel 2.6.35

Pacemaker 1.0.9.1
Corosync 1.2.1
Cluster Agents 1.0.3 (Heartbeat)
Cluster Glue 1.0.6
OpenAIS 1.1.2

DRBD 8.3.10
OCFS2 1.5.0

cat /sys/fs/ocfs2/cluster_stack = pcmk

node1: mounted.ocfs2 -d

Device                FS     UUID                                  Label
/dev/sda3             ocfs2  fe4273e1-f866-4541-bbcf-66c5dfd496d6

node2: mounted.ocfs2 -d

Device                FS     UUID                                  Label
/dev/sda3             ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef
/dev/drbd0            ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef

* NOTES:
- Both nodes are identical, in fact one node is a direct mirror (hdd clone)
- I have attached the CIB (crm configure edit contents) and mount trace



-------------- next part --------------
A non-text attachment was scrubbed...
Name: crm_configure.txt
Type: application/octet-stream
Size: 2852 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110915/58350be9/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mount_trace.txt
Type: application/octet-stream
Size: 4747 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110915/58350be9/attachment-0005.obj>


More information about the Pacemaker mailing list