[ClusterLabs] single node fails to start the ocfs2 resource

Muhammad Sharfuddin M.Sharfuddin at nds.com.pk
Fri Mar 9 11:55:18 EST 2018


Hi,

This two node cluster starts resources when both nodes are online but 
does not start the ocfs2 resources

when one node is offline. e.g if I gracefully stop the cluster resources 
then stop the pacemaker service on

either node, and try to start the ocfs2 resource on the online node, it 
fails.

logs:

pipci001 pengine[17732]:   notice: Start   dlm:0#011(pipci001)
pengine[17732]:   notice: Start   p-fssapmnt:0#011(pipci001)
pengine[17732]:   notice: Start   p-fsusrsap:0#011(pipci001)
pipci001 pengine[17732]:   notice: Calculated transition 2, saving 
inputs in /var/lib/pacemaker/pengine/pe-input-339.bz2
pipci001 crmd[17733]:   notice: Processing graph 2 
(ref=pe_calc-dc-1520613202-31) derived from 
/var/lib/pacemaker/pengine/pe-input-339.bz2
crmd[17733]:   notice: Initiating start operation dlm_start_0 locally on 
pipci001
lrmd[17730]:   notice: executing - rsc:dlm action:start call_id:69
dlm_controld[19019]: 4575 dlm_controld 4.0.7 started
lrmd[17730]:   notice: finished - rsc:dlm action:start call_id:69 
pid:18999 exit-code:0 exec-time:1082ms queue-time:1ms
crmd[17733]:   notice: Result of start operation for dlm on pipci001: 0 (ok)
crmd[17733]:   notice: Initiating monitor operation dlm_monitor_60000 
locally on pipci001
crmd[17733]:   notice: Initiating start operation p-fssapmnt_start_0 
locally on pipci001
lrmd[17730]:   notice: executing - rsc:p-fssapmnt action:start call_id:71
Filesystem(p-fssapmnt)[19052]: INFO: Running start for 
/dev/mapper/sapmnt on /sapmnt
kernel: [ 4576.529938] dlm: Using TCP for communications
kernel: [ 4576.530233] dlm: BFA9FF042AA045F4822C2A6A06020EE9: joining 
the lockspace group.
dlm_controld[19019]: 4629 fence work wait for quorum
dlm_controld[19019]: 4634 BFA9FF042AA045F4822C2A6A06020EE9 wait for quorum
lrmd[17730]:  warning: p-fssapmnt_start_0 process (PID 19052) timed out
kernel: [ 4636.418223] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group 
event done -512 0
kernel: [ 4636.418227] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group join 
failed -512 0
lrmd[17730]:  warning: p-fssapmnt_start_0:19052 - timed out after 60000ms
lrmd[17730]:   notice: finished - rsc:p-fssapmnt action:start call_id:71 
pid:19052 exit-code:1 exec-time:60002ms queue-time:0ms
kernel: [ 4636.420628] ocfs2: Unmounting device (254,1) on (node 0)
crmd[17733]:    error: Result of start operation for p-fssapmnt on 
pipci001: Timed Out
crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed 
(target: 0 vs. rc: 1): Error
crmd[17733]:   notice: Transition aborted by operation 
p-fssapmnt_start_0 'modify' on pipci001: Event failed
crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed 
(target: 0 vs. rc: 1): Error
crmd[17733]:   notice: Transition 2 (Complete=5, Pending=0, Fired=0, 
Skipped=0, Incomplete=6, 
Source=/var/lib/pacemaker/pengine/pe-input-339.bz2): Complete
pengine[17732]:   notice: Watchdog will be used via SBD if fencing is 
required
pengine[17732]:   notice: On loss of CCM Quorum: Ignore
pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
pipci001: unknown error (1)
pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
pipci001: unknown error (1)
pengine[17732]:  warning: Forcing base-clone away from pipci001 after 
1000000 failures (max=2)
pengine[17732]:  warning: Forcing base-clone away from pipci001 after 
1000000 failures (max=2)
pengine[17732]:   notice: Stop    dlm:0#011(pipci001)
pengine[17732]:   notice: Stop    p-fssapmnt:0#011(pipci001)
pengine[17732]:   notice: Calculated transition 3, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-340.bz2
pengine[17732]:   notice: Watchdog will be used via SBD if fencing is 
required
pengine[17732]:   notice: On loss of CCM Quorum: Ignore
pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
pipci001: unknown error (1)
pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
pipci001: unknown error (1)
pengine[17732]:  warning: Forcing base-clone away from pipci001 after 
1000000 failures (max=2)
pipci001 pengine[17732]:  warning: Forcing base-clone away from pipci001 
after 1000000 failures (max=2)
pengine[17732]:   notice: Stop    dlm:0#011(pipci001)
pengine[17732]:   notice: Stop    p-fssapmnt:0#011(pipci001)
pengine[17732]:   notice: Calculated transition 4, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-341.bz2
crmd[17733]:   notice: Processing graph 4 (ref=pe_calc-dc-1520613263-36) 
derived from /var/lib/pacemaker/pengine/pe-input-341.bz2
crmd[17733]:   notice: Initiating stop operation p-fssapmnt_stop_0 
locally on pipci001
lrmd[17730]:   notice: executing - rsc:p-fssapmnt action:stop call_id:72
Filesystem(p-fssapmnt)[19189]: INFO: Running stop for /dev/mapper/sapmnt 
on /sapmnt
pipci001 lrmd[17730]:   notice: finished - rsc:p-fssapmnt action:stop 
call_id:72 pid:19189 exit-code:0 exec-time:83ms queue-time:0ms
pipci001 crmd[17733]:   notice: Result of stop operation for p-fssapmnt 
on pipci001: 0 (ok)
crmd[17733]:   notice: Initiating stop operation dlm_stop_0 locally on 
pipci001
pipci001 lrmd[17730]:   notice: executing - rsc:dlm action:stop call_id:74
pipci001 dlm_controld[19019]: 4636 shutdown ignored, active lockspaces


resource configuration:

primitive p-fssapmnt Filesystem \
         params device="/dev/mapper/sapmnt" directory="/sapmnt" 
fstype=ocfs2 \
         op monitor interval=20 timeout=40 \
         op start timeout=60 interval=0 \
         op stop timeout=60 interval=0
primitive dlm ocf:pacemaker:controld \
         op monitor interval=60 timeout=60 \
         op start interval=0 timeout=90 \
         op stop interval=0 timeout=100
clone base-clone base-group \
         meta interleave=true target-role=Started

cluster properties:
property cib-bootstrap-options: \
         have-watchdog=true \
         stonith-enabled=true \
         stonith-timeout=80 \
         startup-fencing=true \


Software versions:

kernel version: 4.4.114-94.11-default
pacemaker-1.1.16-4.8.x86_64
corosync-2.3.6-9.5.1.x86_64
ocfs2-kmp-default-4.4.114-94.11.3.x86_64
ocfs2-tools-1.8.5-1.35.x86_64
dlm-kmp-default-4.4.114-94.11.3.x86_64
libdlm3-4.0.7-1.28.x86_64
libdlm-4.0.7-1.28.x86_64


-- 
Regards,
Muhammad Sharfuddin


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the Users mailing list