[Pacemaker] kernel BUG at fs/dlm/lowcomms.c:861! on Fedora 12

Andrew Beekhof andrew at beekhof.net
Mon Jan 4 04:39:21 EST 2010


Hi,

The people who look after the dlm are on the linux-cluster at redhat.com
mailing list.
Best to direct this issue there.

-- Andrew

On Sun, Jan 3, 2010 at 8:30 PM, Daniel Qian <daniel at bestningning.com> wrote:
> I came a long way to set up this two-node cluster of pacemaker +
> openais/corosync + ocfs2 + DLM + drbd on Fedora 12. I resolved issues one
> after another until I hit this last hurdle which is beyond my power to
> overcome. All other components are working fine.
>
> [root at ilo150 ~]# crm_mon -1
>
>
> ============
> Last updated: Sun Jan  3 12:17:17 2010
> Stack: openais
> Current DC: ilo143 - partition with quorum
> Version: 1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7
> 2 Nodes configured, 2 expected votes
> 5 Resources configured.
> ============
>
> Online: [ ilo143 ilo150 ]
>
> Master/Slave Set: drbd_clone0
>    Masters: [ ilo143 ilo150 ]
> Clone Set: dlm-clone
>    Started: [ ilo143 ilo150 ]
> Clone Set: o2cb-clone
>    Started: [ ilo143 ilo150 ]
> Clone Set: ip-clone (unique)
>    ClusterIP:0        (ocf::heartbeat:IPaddr2):       Started ilo143
>    ClusterIP:1        (ocf::heartbeat:IPaddr2):       Started ilo143
>
>
> However I start having this problem when I try to mount the ocfs2 file
> system by typing "crm resource start fs0-clone". Snippet from
> /var/log/messages
>
> Jan  2 17:46:13 ilo150 kernel: ------------[ cut here ]------------
> Jan  2 17:46:13 ilo150 kernel: kernel BUG at fs/dlm/lowcomms.c:861!
> Jan  2 17:46:13 ilo150 kernel: invalid opcode: 0000 [#1] SMP
> Jan  2 17:46:13 ilo150 kernel: last sysfs file:
> /sys/kernel/dlm/5316FDFD93BB4F7E97B296FC513FA149/event_done
> Jan  2 17:46:13 ilo150 kernel: CPU 1
> Jan  2 17:46:13 ilo150 kernel: Modules linked in: sctp libcrc32c ocfs2
> ocfs2_nodemanager ocfs2_stack_user ocfs2_stackglue dlm drbd configfs ipv6
> bnx2 ipmi_si serio_raw ipmi_msghandler hpwdt iTCO_wdt iTCO_vendor_support
> cciss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded:
> scsi_wait_scan]
> Jan  2 17:46:13 ilo150 kernel: Pid: 2918, comm: dlm_send Not tainted
> 2.6.31.9-174.fc12.x86_64 #1 ProLiant DL360 G6
> Jan  2 17:46:13 ilo150 kernel: RIP: 0010:[<ffffffffa01d75c9>]
> [<ffffffffa01d75c9>] sctp_init_assoc+0x13e/0x2c1 [dlm]
> Jan  2 17:46:13 ilo150 kernel: RSP: 0018:ffff8808e9bdbc20  EFLAGS: 00010246
> Jan  2 17:46:13 ilo150 kernel: RAX: ffff8808e9920038 RBX: ffff8808e9920000
> RCX: 0000000000000000
> Jan  2 17:46:13 ilo150 kernel: RDX: 0000000000000000 RSI: 0000000000524852
> RDI: ffff8808e9920048
> Jan  2 17:46:13 ilo150 kernel: RBP: ffff8808e9bdbe00 R08: 0000000000000000
> R09: ffff88091f804200
> Jan  2 17:46:13 ilo150 kernel: R10: ffff88091f804200 R11: 0000000000000000
> R12: ffff8808e9920038
> Jan  2 17:46:13 ilo150 kernel: R13: ffff8808e9920048 R14: ffff8808eed9a000
> R15: ffff8808e9bdbe80
> Jan  2 17:46:13 ilo150 kernel: FS:  0000000000000000(0000)
> GS:ffff880028053000(0000) knlGS:0000000000000000
> Jan  2 17:46:13 ilo150 kernel: CS:  0010 DS: 0018 ES: 0018 CR0:
> 000000008005003b
> Jan  2 17:46:13 ilo150 kernel: CR2: 00007fc4485c9000 CR3: 0000000001001000
> CR4: 00000000000006e0
> Jan  2 17:46:13 ilo150 kernel: DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> Jan  2 17:46:13 ilo150 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0
> DR7: 0000000000000400
> Jan  2 17:46:13 ilo150 kernel: Process dlm_send (pid: 2918, threadinfo
> ffff8808e9bda000, task ffff8808e9be0000)
> Jan  2 17:46:13 ilo150 kernel: Stack:
> Jan  2 17:46:13 ilo150 kernel: 0000000000000000 0000000000000000
> ffff8808e9bdbd10 0000000000000010
> Jan  2 17:46:13 ilo150 kernel: <0> 0000000000000000 0000000000000000
> ffff8808e9bdbd90 0000000000000030
> Jan  2 17:46:13 ilo150 kernel: <0> 0000000000000080 0000000000000000
> 0000000000000000 0000000000000000
> Jan  2 17:46:13 ilo150 kernel: Call Trace:
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff810106c5>] ?
> __switch_to+0x18b/0x217
> Jan  2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ?
> process_send_sockets+0x0/0x17c [dlm]
> Jan  2 17:46:13 ilo150 kernel: [<ffffffffa01d72b0>]
> process_send_sockets+0x34/0x17c [dlm]
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff810b272d>] ?
> probe_workqueue_execution+0xb1/0xcd
> Jan  2 17:46:13 ilo150 kernel: [<ffffffffa01d727c>] ?
> process_send_sockets+0x0/0x17c [dlm]
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff810635a0>]
> worker_thread+0x18a/0x224
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff81067b37>] ?
> autoremove_wake_function+0x0/0x39
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff81063416>] ?
> worker_thread+0x0/0x224
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff810677b5>] kthread+0x91/0x99
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff81012daa>] child_rip+0xa/0x20
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff81067724>] ? kthread+0x0/0x99
> Jan  2 17:46:13 ilo150 kernel: [<ffffffff81012da0>] ? child_rip+0x0/0x20
> Jan  2 17:46:13 ilo150 kernel: Code: 60 fe ff ff 80 00 00 00 89 85 38 fe ff
> ff 48 8d 45 90 48 89 85 50 fe ff ff e8 88 5f 24 e1 4c 8b 63 38 48 8d 43 38
> 49 39 c4 75 04 <0f> 0b eb fe 4d 63 44 24 1c 41 8b 54 24 18 66 ff 43 48 45 31
> ff
> Jan  2 17:46:13 ilo150 kernel: RIP  [<ffffffffa01d75c9>]
> sctp_init_assoc+0x13e/0x2c1 [dlm]
> Jan  2 17:46:13 ilo150 kernel: RSP <ffff8808e9bdbc20>
> Jan  2 17:46:13 ilo150 kernel: ---[ end trace d3844af31bca174b ]---
>
> I am wondering if this is a Fedora specific bug. I have the full messages
> logs from both nodes if anyone is interested and here is my config
>
> [root at ilo150 ~]# crm configure show
> node ilo143
> node ilo150
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>       params ip="xx.xx.xx.xx" cidr_netmask="32" \
>       op monitor interval="30s"
> primitive dlm ocf:pacemaker:controld \
>       op monitor interval="120s"
> primitive drbd_r0 ocf:linbit:drbd \
>       params drbd_resource="r0" \
>       op monitor interval="20" role="Master" timeout="20" \
>       op monitor interval="30" role="Slave" timeout="20"
> primitive fs0 ocf:heartbeat:Filesystem \
>       params device="/dev/drbd0" directory="/mnt" fstype="ocfs2" \
>       meta target-role="Stopped"
> primitive o2cb ocf:ocfs2:o2cb \
>       op monitor interval="120s"
> ms drbd_clone0 drbd_r0 \
>       meta master-max="2" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> clone dlm-clone dlm \
>       meta interleave="true"
> clone fs0-clone fs0
> clone ip-clone ClusterIP \
>       meta globally-unique="true" clone-max="2" clone-node-max="2"
> clone o2cb-clone o2cb \
>       meta interleave="true"
> colocation fs0-with-o2cb inf: fs0-clone o2cb-clone
> colocation fs0_on_drbd inf: fs0-clone drbd_clone0:Master
> colocation o2cb-with-dlm inf: o2cb-clone dlm-clone
> order fs0-after-drbd inf: drbd_clone0:promote fs0-clone:start
> order fs0-after-o2cb inf: o2cb-clone fs0-clone
> order o2cb-after-dlm inf: dlm-clone o2cb-clone
> property $id="cib-bootstrap-options" \
>       dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
>       cluster-infrastructure="openais" \
>       expected-quorum-votes="2" \
>       no-quorum-policy="ignore" \
>       stonith-enabled="false" \
>       last-lrm-refresh="1262472066"
>
>
>
>
> Thanks,
> Daniel
>
>
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>




More information about the Pacemaker mailing list