[Pacemaker] kernel BUG at fs/dlm/lowcomms.c:861! on Fedora 12

Daniel Qian daniel at bestningning.com
Sun Jan 3 14:30:20 EST 2010


I came a long way to set up this two-node cluster of pacemaker + 
openais/corosync + ocfs2 + DLM + drbd on Fedora 12. I resolved issues one 
after another until I hit this last hurdle which is beyond my power to 
overcome. All other components are working fine.

[root at ilo150 ~]# crm_mon -1


============
Last updated: Sun Jan  3 12:17:17 2010
Stack: openais
Current DC: ilo143 - partition with quorum
Version: 1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ ilo143 ilo150 ]

 Master/Slave Set: drbd_clone0
     Masters: [ ilo143 ilo150 ]
 Clone Set: dlm-clone
     Started: [ ilo143 ilo150 ]
 Clone Set: o2cb-clone
     Started: [ ilo143 ilo150 ]
 Clone Set: ip-clone (unique)
     ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started ilo143
     ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started ilo143


However I start having this problem when I try to mount the ocfs2 file 
system by typing "crm resource start fs0-clone". Snippet from 
/var/log/messages

Jan  2 17:46:13 ilo150 kernel: ------------[ cut here ]------------
Jan  2 17:46:13 ilo150 kernel: kernel BUG at fs/dlm/lowcomms.c:861!
Jan  2 17:46:13 ilo150 kernel: invalid opcode: 0000 [#1] SMP
Jan  2 17:46:13 ilo150 kernel: last sysfs file: 
/sys/kernel/dlm/5316FDFD93BB4F7E97B296FC513FA149/event_done
Jan  2 17:46:13 ilo150 kernel: CPU 1
Jan  2 17:46:13 ilo150 kernel: Modules linked in: sctp libcrc32c ocfs2 
ocfs2_nodemanager ocfs2_stack_user ocfs2_stackglue dlm drbd configfs ipv6 
bnx2 ipmi_si serio_raw ipmi_msghandler hpwdt iTCO_wdt iTCO_vendor_support 
cciss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: 
scsi_wait_scan]
Jan  2 17:46:13 ilo150 kernel: Pid: 2918, comm: dlm_send Not tainted 
2.6.31.9-174.fc12.x86_64 #1 ProLiant DL360 G6
Jan  2 17:46:13 ilo150 kernel: RIP: 0010:[<ffffffffa01d75c9>] 
[<ffffffffa01d75c9>] sctp_init_assoc+0x13e/0x2c1 [dlm]
Jan  2 17:46:13 ilo150 kernel: RSP: 0018:ffff8808e9bdbc20  EFLAGS: 00010246
Jan  2 17:46:13 ilo150 kernel: RAX: ffff8808e9920038 RBX: ffff8808e9920000 
RCX: 0000000000000000
Jan  2 17:46:13 ilo150 kernel: RDX: 0000000000000000 RSI: 0000000000524852 
RDI: ffff8808e9920048
Jan  2 17:46:13 ilo150 kernel: RBP: ffff8808e9bdbe00 R08: 0000000000000000 
R09: ffff88091f804200
Jan  2 17:46:13 ilo150 kernel: R10: ffff88091f804200 R11: 0000000000000000 
R12: ffff8808e9920038
Jan  2 17:46:13 ilo150 kernel: R13: ffff8808e9920048 R14: ffff8808eed9a000 
R15: ffff8808e9bdbe80
Jan  2 17:46:13 ilo150 kernel: FS:  0000000000000000(0000) 
GS:ffff880028053000(0000) knlGS:0000000000000000
Jan  2 17:46:13 ilo150 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
000000008005003b
Jan  2 17:46:13 ilo150 kernel: CR2: 00007fc4485c9000 CR3: 0000000001001000 
CR4: 00000000000006e0
Jan  2 17:46:13 ilo150 kernel: DR0: 0000000000000000 DR1: 0000000000000000 
DR2: 0000000000000000
Jan  2 17:46:13 ilo150 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 
DR7: 0000000000000400
Jan  2 17:46:13 ilo150 kernel: Process dlm_send (pid: 2918, threadinfo 
ffff8808e9bda000, task ffff8808e9be0000)
Jan  2 17:46:13 ilo150 kernel: Stack:
Jan  2 17:46:13 ilo150 kernel: 0000000000000000 0000000000000000 
ffff8808e9bdbd10 0000000000000010
Jan  2 17:46:13 ilo150 kernel: <0> 0000000000000000 0000000000000000 
ffff8808e9bdbd90 0000000000000030
Jan  2 17:46:13 ilo150 kernel: <0> 0000000000000080 0000000000000000 
0000000000000000 0000000000000000
Jan  2 17:46:13 ilo150 kernel: Call Trace:
Jan  2 17:46:13 ilo150 kernel: [<ffffffff810106c5>] ? 
__switch_to+0x18b/0x217
Jan  2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ? 
process_send_sockets+0x0/0x17c [dlm]
Jan  2 17:46:13 ilo150 kernel: [<ffffffffa01d72b0>] 
process_send_sockets+0x34/0x17c [dlm]
Jan  2 17:46:13 ilo150 kernel: [<ffffffff810b272d>] ? 
probe_workqueue_execution+0xb1/0xcd
Jan  2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ? 
process_send_sockets+0x0/0x17c [dlm]
Jan  2 17:46:13 ilo150 kernel: [<ffffffff810635a0>] 
worker_thread+0x18a/0x224
Jan  2 17:46:13 ilo150 kernel: [<ffffffff81067b37>] ? 
autoremove_wake_function+0x0/0x39
Jan  2 17:46:13 ilo150 kernel: [<ffffffff81063416>] ? 
worker_thread+0x0/0x224
Jan  2 17:46:13 ilo150 kernel: [<ffffffff810677b5>] kthread+0x91/0x99
Jan  2 17:46:13 ilo150 kernel: [<ffffffff81012daa>] child_rip+0xa/0x20
Jan  2 17:46:13 ilo150 kernel: [<ffffffff81067724>] ? kthread+0x0/0x99
Jan  2 17:46:13 ilo150 kernel: [<ffffffff81012da0>] ? child_rip+0x0/0x20
Jan  2 17:46:13 ilo150 kernel: Code: 60 fe ff ff 80 00 00 00 89 85 38 fe ff 
ff 48 8d 45 90 48 89 85 50 fe ff ff e8 88 5f 24 e1 4c 8b 63 38 48 8d 43 38 
49 39 c4 75 04 <0f> 0b eb fe 4d 63 44 24 1c 41 8b 54 24 18 66 ff 43 48 45 31 
ff
Jan  2 17:46:13 ilo150 kernel: RIP  [<ffffffffa01d75c9>] 
sctp_init_assoc+0x13e/0x2c1 [dlm]
Jan  2 17:46:13 ilo150 kernel: RSP <ffff8808e9bdbc20>
Jan  2 17:46:13 ilo150 kernel: ---[ end trace d3844af31bca174b ]---

I am wondering if this is a Fedora specific bug. I have the full messages 
logs from both nodes if anyone is interested and here is my config

[root at ilo150 ~]# crm configure show
node ilo143
node ilo150
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="xx.xx.xx.xx" cidr_netmask="32" \
        op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
        op monitor interval="120s"
primitive drbd_r0 ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="20" role="Master" timeout="20" \
        op monitor interval="30" role="Slave" timeout="20"
primitive fs0 ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/mnt" fstype="ocfs2" \
        meta target-role="Stopped"
primitive o2cb ocf:ocfs2:o2cb \
        op monitor interval="120s"
ms drbd_clone0 drbd_r0 \
        meta master-max="2" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true"
clone dlm-clone dlm \
        meta interleave="true"
clone fs0-clone fs0
clone ip-clone ClusterIP \
        meta globally-unique="true" clone-max="2" clone-node-max="2"
clone o2cb-clone o2cb \
        meta interleave="true"
colocation fs0-with-o2cb inf: fs0-clone o2cb-clone
colocation fs0_on_drbd inf: fs0-clone drbd_clone0:Master
colocation o2cb-with-dlm inf: o2cb-clone dlm-clone
order fs0-after-drbd inf: drbd_clone0:promote fs0-clone:start
order fs0-after-o2cb inf: o2cb-clone fs0-clone
order o2cb-after-dlm inf: dlm-clone o2cb-clone
property $id="cib-bootstrap-options" \
        dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        last-lrm-refresh="1262472066"




Thanks,
Daniel







More information about the Pacemaker mailing list