[ClusterLabs] CLVM won't start

Schaefer, Diane E diane.schaefer at unisys.com
Thu Apr 2 09:31:19 EDT 2015


Hi,
  I'm running on SLES 11 SP3
pacemaker-1.1.11-0.7.53
lvm2-clvm-2.02.98-0.25.4

I've created a base-group with dlm and clvm and made it a clone group.  Both systems are able to start dlm and seem to connect, but the start of clvm on the second system always times out, stops and restarts.  The monitor in the restart sees the pid and reports the resource as running.

I've diagnosed the issue to be that the "start_daemon /usr/sbin/clvmd" is not returning in the start code.  Any idea on what could be wrong?

My cluster defs:

primitive p_clvm ocf:lvm2:clvmd \
        op monitor timeout="20" interval="20" \
        op start timeout="90" interval="0" \
        op stop timeout="100" interval="0" \
        meta target-role="Started"
primitive p_dlm ocf:pacemaker:controld \
        op monitor timeout="20" interval="10" start-delay="0" \
        op start timeout="90" interval="0" \
        op stop timeout="100" interval="0" \
        meta target-role="Started"
primitive stonith-sbd stonith:external/sbd \
        op start start-delay="10s" interval="0"
group base_group p_dlm p_clvm
clone base_clone base_group \
        meta interleave="true"

my syslog:

Apr  2 13:20:05 usrv-fsm2 clvmd(p_clvm)[16336]: INFO: calling start_daemon
Apr  2 13:20:05 usrv-fsm2 crmd[15791]:   notice: process_lrm_event: LRM operation p_dlm_monitor_10000 (call=16, rc=0, cib-update=50, confirmed=false) ok
Apr  2 13:20:05 usrv-fsm2 clvmd[16357]: CLVMD started
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: process_uevent: uevent: add@/kernel/dlm/clvmd
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: process_uevent: kernel: add@ clvmd
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: process_uevent: uevent: online@/kernel/dlm/clvmd
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: process_uevent: kernel: online@ clvmd
Apr  2 13:20:05 usrv-fsm2 kernel: [ 3532.121548] dlm: Using TCP for communications
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: log_config: dlm:ls:clvmd conf 2 1 0 memb 16850860 16916396 join 16916396 left
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: add_change: clvmd add_change cg 1 joined nodeid 16916396
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: add_change: clvmd add_change cg 1 we joined
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: add_change: clvmd add_change cg 1 counts member 2 joined 1 remove 0 failed 0
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: check_fencing_done: clvmd check_fencing done
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: check_quorum_done: clvmd check_quorum disabled
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: check_fs_done: clvmd check_fs none registered
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: send_info: clvmd send_start cg 1 flags 1 data2 0 counts 0 2 1 0 0
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: receive_start: clvmd receive_start 16916396:1 len 80
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: match_change: clvmd match_change 16916396:1 matches cg 1
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: wait_messages_done: clvmd wait_messages cg 1 need 1 of 2
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: receive_start: clvmd receive_start 16850860:2 len 80
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: match_change: clvmd match_change 16850860:2 matches cg 1
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: wait_messages_done: clvmd wait_messages cg 1 got all 2
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: start_kernel: clvmd start_kernel cg 1 member_count 2
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: do_sysfs: write "1090842362" to "/sys/kernel/dlm/clvmd/id"
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: set_configfs_members: set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/16850860"
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: set_configfs_members: set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/16916396"
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: do_sysfs: write "1" to "/sys/kernel/dlm/clvmd/control"
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: do_sysfs: write "0" to "/sys/kernel/dlm/clvmd/event_done"
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: set_plock_ckpt_node: clvmd set_plock_ckpt_node from 0 to 16850860
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: receive_plocks_stored: clvmd receive_plocks_stored 16850860:2 flags a sig 0 need_plocks 1
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: match_change: clvmd match_change 16850860:2 matches cg 1
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: retrieve_plocks: clvmd retrieve_plocks
Apr  2 13:20:05 usrv-fsm2 kernel: [ 3532.125482] dlm: connecting to 16850860
Apr  2 13:20:05 usrv-fsm2 cluster-dlm[16315]: retrieve_plocks: clvmd retrieve_plocks first 0 last 0 r_count 0 p_count 0 sig 0
Apr  2 13:21:35 usrv-fsm2 lrmd[15788]:  warning: child_timeout_callback: p_clvm_start_0 process (PID 16336) timed out
Apr  2 13:21:35 usrv-fsm2 lrmd[15788]:  warning: operation_finished: p_clvm_start_0:16336 - timed out after 90000ms
Apr  2 13:21:35 usrv-fsm2 lrmd[15788]:   notice: operation_finished: p_clvm_start_0:16336:stderr [   local socket: connect failed: Connection refused ]
Apr  2 13:21:35 usrv-fsm2 crmd[15791]:    error: process_lrm_event: LRM operation p_clvm_start_0 (17) Timed Out (timeout=90000ms)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_clvm (1)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 109: fail-count-p_clvm=1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-p_clvm (1427980972)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 111: last-failure-p_clvm=1427980972
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_clvm (2)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 113: fail-count-p_clvm=2
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-p_clvm (1427980972)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 115: last-failure-p_clvm=1427980972
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_clvm (3)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 117: fail-count-p_clvm=3
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-p_clvm (1427980972)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 119: last-failure-p_clvm=1427980972
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-p_clvm (4)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 121: fail-count-p_clvm=4
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_cs_dispatch: Update relayed from usrv-fsm1
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-p_clvm (1427980972)
Apr  2 13:21:35 usrv-fsm2 attrd[15789]:   notice: attrd_perform_update: Sent update 123: last-failure-p_clvm=1427980972
Apr  2 13:21:35 usrv-fsm2 clvmd(p_clvm)[16464]: INFO: Stopping p_clvm
Apr  2 13:21:35 usrv-fsm2 clvmd(p_clvm)[16464]: INFO: Stopping clvmd
Apr  2 13:21:36 usrv-fsm2 crmd[15791]:   notice: process_lrm_event: LRM operation p_clvm_stop_0 (call=18, rc=0, cib-update=52, confirmed=true) ok
Apr  2 13:21:36 usrv-fsm2 clvmd(p_clvm)[16493]: INFO: Starting p_clvm
Apr  2 13:21:36 usrv-fsm2 clvmd(p_clvm)[16493]: INFO: clvmd is started, checking cmirrord
Apr  2 13:21:37 usrv-fsm2 cmirrord[16507]: Starting cmirrord:
Apr  2 13:21:37 usrv-fsm2 cmirrord[16507]:  Built: May 29 2013 15:04:35
Apr  2 13:21:39 usrv-fsm2 clvmd(p_clvm)[16493]: INFO: cmirrord started...rpm -

Thanks for any help on this.
Diane Schaefer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150402/41c7bc5c/attachment-0002.html>


More information about the Users mailing list