[ClusterLabs] [Linux-HA] file system resource becomes inaccesible when any of the node goes down

Muhammad Sharfuddin M.Sharfuddin at nds.com.pk
Tue Jul 7 04:52:24 UTC 2015


On 07/07/2015 08:59 AM, Andrei Borzenkov wrote:
> В Tue, 07 Jul 2015 01:02:49 +0500
> Muhammad Sharfuddin <M.Sharfuddin at nds.com.pk> пишет:
>
>> On 07/06/2015 07:04 PM, Dejan Muhamedagic wrote:
>>> On Mon, Jul 06, 2015 at 03:14:34PM +0500, Muhammad Sharfuddin wrote:
>>>> On 07/06/2015 02:50 PM, Dejan Muhamedagic wrote:
>>>>> Hi,
>>>>>
>>>>> On Sun, Jul 05, 2015 at 09:13:56PM +0500, Muhammad Sharfuddin wrote:
>>>>>> SLES 11 SP 3 + online updates(pacemaker-1.1.11-0.8.11.70
>>>>>> openais-1.1.4-5.22.1.7)
>>>>>>
>>>>>> Its a dual primary drbd cluster, which mounts a file system resource
>>>>>> on both the cluster nodes simultaneously(file system type is ocfs2).
>>>>>>
>>>>>> Whenever any of the nodes goes down, the file system(/sharedata)
>>>>>> become inaccessible for exact 35 seconds on the other
>>>>>> (surviving/online) node, and then become available again on the
>>>>>> online node.
>>>>>>
>>>>>> Please help me understand why the node which survives or remains
>>>>>> online unable to access the file system resource(/sharedata) for 35
>>>>>> seconds ? and how can I fix the cluster so that file system remains
>>>>>> accessible on the surviving node without any interruption/delay(as
>>>>>> in my case of about 35 seconds)
>>>>>>
>>>>>> By inaccessible, I meant to say that running "ls -l /sharedata" and
>>>>>> "df /sharedata" does not return any output and does not return the
>>>>>> prompt back on the online node for exact 35 seconds once the other
>>>>>> node becomes offline.
>>>>>>
>>>>>> e.g "node1" got offline somewhere around  01:37:15, and then
>>>>>> /sharedata file system was inaccessible during 01:37:35 and 01:38:18
>>>>>> on the online node i.e "node2".
>>>>> Before the failing node gets fenced you won't be able to use the
>>>>> ocfs2 filesystem. In this case, the fencing operation takes 40
>>>>> seconds:
>>>> so its expected.
>>>>>> [...]
>>>>>> Jul  5 01:37:35 node2 sbd: [6197]: info: Writing reset to node slot node1
>>>>>> Jul  5 01:37:35 node2 sbd: [6197]: info: Messaging delay: 40
>>>>>> Jul  5 01:38:15 node2 sbd: [6197]: info: reset successfully
>>>>>> delivered to node1
>>>>>> Jul  5 01:38:15 node2 sbd: [6196]: info: Message successfully delivered.
>>>>>> [...]
>>>>> You may want to reduce that sbd timeout.
>>>> Ok, so reducing the sbd timeout(or msgwait) would provide the
>>>> uninterrupted access to the ocfs2 file system on the
>>>> surviving/online node ?
>>>> or would it just minimize the downtime ?
>>> Only the latter. But note that it is important that once sbd
>>> reports success, the target node is really down. sbd is
>>> timeout-based, i.e. it doesn't test whether the node actually
>>> left. Hence this timeout shouldn't be too short.
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>>>>> Thanks,
>>>>>
>>>>> Dejan
>>>>> _______________________________________________
>>>>>
>>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Muhammad Sharfuddin
>>>> _______________________________________________
>>>>
>>>>
>> Now I did two changes
>> 1 - remove the drbd, now there is a shared LUN
>> 2 - reduce the watchdog and msgwait timeout values to 5 and 10 respectively.
>>
>> Still when either of the node goes down delay of exact 15 seconds
>> occurs. My concern/question,  when either of node goes down/offline, is
>> there no way to uninterrupted access the ocfs2 file system resource  ?
>>
> For all practical purposes 15 seconds *is* uniterrupted access.
what would happened to those resources(DB, Apps, etc) that were running 
from the ocfs2 file system
on the surviving/online node when file must not be accessible for 15 
seconds ?

I don't see a real practical usage of ocfs2 here.

>> at 00:43:44, node1 went offline, ocfs2 file system resource remained
>> inaccessible exactly in between 00:44:00 and 00:44:14, /var/log/messages
>> from node2:
> Well, stonith took 10 seconds to finish; I do not know if it is
> possible to speed up DLM convergence after node kill was confirmed. You
> could look at making stonith faster.
>
> Hmm ... you said you eliminated SBD but apparently you are still using
> it together with associated delay
>
> Operation 'reboot' [6027] (call 2 from crmd.4371) for host 'node1' with
> device 'sbd_stonith' returned: 0 (OK)
>
> May be try different stonith plugin?
>
>> Jul  7 00:43:55 node2 corosync[4361]:  [TOTEM ] A processor failed,
>> forming new configuration.
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] CLM CONFIGURATION CHANGE
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] New Configuration:
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ]     r(0) ip(172.16.241.132)
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] Members Left:
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ]     r(0) ip(172.16.241.131)
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] Members Joined:
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] notice:
>> pcmk_peer_update: Transitional membership event on ring 348: memb=1,
>> new=0, lost=1
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] info: pcmk_peer_update:
>> memb: node2 739307908
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] info: pcmk_peer_update:
>> lost: node1 739307907
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] CLM CONFIGURATION CHANGE
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] New Configuration:
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ]     r(0) ip(172.16.241.132)
>> Jul  7 00:44:01 node2 cluster-dlm[4866]:   notice:
>> plugin_handle_membership: Membership 348: quorum lost
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]:   notice:
>> plugin_handle_membership: Membership 348: quorum lost
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] Members Left:
>> Jul  7 00:44:01 node2 cluster-dlm[4866]:   notice:
>> crm_update_peer_state: plugin_handle_membership: Node node1[739307907] -
>> state is now lost (was member)
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: update_cluster: Processing
>> membership 348
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: del_configfs_node:
>> del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/739307907"
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: dlm_process_node: Removed
>> inactive node 739307907: born-on=344, last-seen=344, this-event=348,
>> last-event=344
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]:   notice:
>> crm_update_peer_state: plugin_handle_membership: Node node1[739307907] -
>> state is now lost (was member)
>> Jul  7 00:44:01 node2 corosync[4361]:  [CLM   ] Members Joined:
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: dlm_process_node: Skipped
>> active node 739307908: born-on=336, last-seen=348, this-event=348,
>> last-event=344
>> Jul  7 00:44:01 node2 stonith-ng[4367]:   notice:
>> plugin_handle_membership: Membership 348: quorum lost
>> Jul  7 00:44:01 node2 cib[4366]:   notice: plugin_handle_membership:
>> Membership 348: quorum lost
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: plugin_handle_membership:
>> Membership 348: quorum lost
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: confchg called
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] notice:
>> pcmk_peer_update: Stable membership event on ring 348: memb=1, new=0, lost=0
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: log_config: dlm:controld conf 1
>> 0 1 memb 739307908 join left 739307907
>> Jul  7 00:44:01 node2 stonith-ng[4367]:   notice: crm_update_peer_state:
>> plugin_handle_membership: Node node1[739307907] - state is now lost (was
>> member)
>> Jul  7 00:44:01 node2 cib[4366]:   notice: crm_update_peer_state:
>> plugin_handle_membership: Node node1[739307907] - state is now lost (was
>> member)
>> Jul  7 00:44:01 node2 crmd[4371]:  warning: match_down_event: No match
>> for shutdown action on node1
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: ocfs2_controld (group
>> "ocfs2:controld") confchg: members 1, left 1, joined 0
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] info: pcmk_peer_update:
>> MEMB: node2 739307908
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: log_config:
>> dlm:ls:E4C27EC365B44D03B3629077F1B31722 conf 1 0 1 memb 739307908 join
>> left 739307907
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: peer_update_callback:
>> Stonith/shutdown of node1 not matched
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: node daemon left 739307907
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: node down 739307907
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] info:
>> ais_mark_unseen_peer_dead: Node node1 was not seen in the previous
>> transition
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: node_history_fail:
>> E4C27EC365B44D03B3629077F1B31722 check_fs nodeid 739307907 set
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: crm_update_peer_state:
>> plugin_handle_membership: Node node1[739307907] - state is now lost (was
>> member)
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: Node 739307907 has left
>> mountgroup E4C27EC365B44D03B3629077F1B31722
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] info: update_member:
>> Node 739307907/node1 is now: lost
>> Jul  7 00:44:01 node2 corosync[4361]:  [pcmk  ] info:
>> send_member_notification: Sending membership update 348 to 5 children
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: add_change:
>> E4C27EC365B44D03B3629077F1B31722 add_change cg 2 remove nodeid 739307907
>> reason 3
>> Jul  7 00:44:01 node2 crmd[4371]:  warning: match_down_event: No match
>> for shutdown action on node1
>> Jul  7 00:44:01 node2 corosync[4361]:  [TOTEM ] A processor joined or
>> left the membership and a new membership was formed.
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: add_change:
>> E4C27EC365B44D03B3629077F1B31722 add_change cg 2 counts member 1 joined
>> 0 remove 1 failed 1
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: stop_kernel:
>> E4C27EC365B44D03B3629077F1B31722 stop_kernel cg 2
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: peer_update_callback:
>> Stonith/shutdown of node1 not matched
>> Jul  7 00:44:01 node2 corosync[4361]:  [CPG   ] chosen downlist: sender
>> r(0) ip(172.16.241.132) ; members(old:2 left:1)
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: do_sysfs: write "0" to
>> "/sys/kernel/dlm/E4C27EC365B44D03B3629077F1B31722/control"
>> Jul  7 00:44:01 node2 sbd: [4349]: WARN: CIB: We do NOT have quorum!
>> Jul  7 00:44:01 node2 sbd: [4347]: WARN: Pacemaker health check: UNHEALTHY
>> Jul  7 00:44:01 node2 kernel: [ 2362.087877] dlm: closing connection to
>> node 739307907
>> Jul  7 00:44:01 node2 corosync[4361]:  [MAIN  ] Completed service
>> synchronization, ready to provide service.
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: fence_node_time: Node
>> 739307907/node1 has not been shot yet
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: Sending notification of node
>> 739307907 for "E4C27EC365B44D03B3629077F1B31722"
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: check_fencing_done:
>> E4C27EC365B44D03B3629077F1B31722 check_fencing 739307907 wait add
>> 1436211736 fail 1436211841 last 0
>> Jul  7 00:44:01 node2 cluster-dlm[4866]: set_fs_notified:
>> E4C27EC365B44D03B3629077F1B31722 set_fs_notified nodeid 739307907
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: confchg called
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: crm_update_quorum: Updating
>> quorum status to false (call=645)
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: group
>> "ocfs2:E4C27EC365B44D03B3629077F1B31722" confchg: members 1, left 1,
>> joined 0
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: do_state_transition: State
>> transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
>> cause=C_FSA_INTERNAL origin=check_join_state ]
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: message from dlmcontrol
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: crm_update_quorum: Updating
>> quorum status to false (call=650)
>> Jul  7 00:44:01 node2 attrd[4369]:   notice: attrd_local_callback:
>> Sending full refresh (origin=crmd)
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: Notified for
>> "E4C27EC365B44D03B3629077F1B31722", node 739307907, status 0
>> Jul  7 00:44:01 node2 pengine[4370]:   notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> Jul  7 00:44:01 node2 attrd[4369]:   notice: attrd_trigger_update:
>> Sending flush op to all hosts for: last-failure-p-sapfs (1436211473)
>> Jul  7 00:44:01 node2 ocfs2_controld[4917]: Completing notification on
>> "E4C27EC365B44D03B3629077F1B31722" for node 739307907
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: pe_fence_node: Node node1
>> will be fenced because the node is no longer part of the cluster
>> Jul  7 00:44:01 node2 attrd[4369]:   notice: attrd_trigger_update:
>> Sending flush op to all hosts for: shutdown (0)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: determine_online_status:
>> Node node1 is unclean
>> Jul  7 00:44:01 node2 attrd[4369]:   notice: attrd_trigger_update:
>> Sending flush op to all hosts for: probe_complete (true)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-dlm:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-dlm:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-o2cb:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-o2cb:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-vgsap:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-vgsap:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-sapfs:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: custom_action: Action
>> p-sapfs:0_stop_0 on node1 is unrunnable (offline)
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: stage6: Scheduling Node
>> node1 for STONITH
>> Jul  7 00:44:01 node2 pengine[4370]:   notice: LogActions: Stop
>> p-dlm:0    (node1)
>> Jul  7 00:44:01 node2 pengine[4370]:   notice: LogActions: Stop
>> p-o2cb:0    (node1)
>> Jul  7 00:44:01 node2 pengine[4370]:   notice: LogActions: Stop
>> p-vgsap:0    (node1)
>> Jul  7 00:44:01 node2 pengine[4370]:   notice: LogActions: Stop
>> p-sapfs:0    (node1)
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: do_te_invoke: Processing
>> graph 37 (ref=pe_calc-dc-1436211841-296) derived from
>> /var/lib/pacemaker/pengine/pe-warn-15.bz2
>> Jul  7 00:44:01 node2 pengine[4370]:  warning: process_pe_message:
>> Calculated Transition 37: /var/lib/pacemaker/pengine/pe-warn-15.bz2
>> Jul  7 00:44:01 node2 crmd[4371]:   notice: te_fence_node: Executing
>> reboot fencing operation (48) on node1 (timeout=90000)
>> Jul  7 00:44:01 node2 stonith-ng[4367]:   notice: handle_request: Client
>> crmd.4371.43345a93 wants to fence (reboot) 'node1' with device '(any)'
>> Jul  7 00:44:01 node2 stonith-ng[4367]:   notice:
>> initiate_remote_stonith_op: Initiating remote operation reboot for
>> node1: b140f322-d273-4de3-b07f-37b8f5939aff (0)
>> Jul  7 00:44:01 node2 stonith-ng[4367]:   notice:
>> can_fence_host_with_device: sbd_stonith can fence node1: dynamic-list
>> Jul  7 00:44:01 node2 sbd: [6039]: info: Delivery process handling
>> /dev/disk/by-id/scsi-1494554000000000035c71550e83586b70dfc77c3b382dfbd
>> Jul  7 00:44:01 node2 sbd: [6039]: info: Device UUID:
>> 5ed06971-5e18-4ab5-9968-620c61a71269
>> Jul  7 00:44:01 node2 sbd: [6039]: info: Writing reset to node slot node1
>> Jul  7 00:44:01 node2 sbd: [6039]: info: Messaging delay: 10
>> Jul  7 00:44:11 node2 sbd: [6039]: info: reset successfully delivered to
>> node1
>> Jul  7 00:44:11 node2 sbd: [6038]: info: Message successfully delivered.
>> Jul  7 00:44:12 node2 stonith-ng[4367]:   notice: log_operation:
>> Operation 'reboot' [6027] (call 2 from crmd.4371) for host 'node1' with
>> device 'sbd_stonith' returned: 0 (OK)
>> Jul  7 00:44:12 node2 stonith-ng[4367]:   notice: remote_op_done:
>> Operation reboot of node1 by node2 for crmd.4371 at node2.b140f322: OK
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: tengine_stonith_callback:
>> Stonith operation 2/48:37:0:f3061adb-af45-4c93-9365-bae0cf01a847: OK (0)
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: tengine_stonith_notify: Peer
>> node1 was terminated (reboot) by node2 for node2: OK
>> (ref=b140f322-d273-4de3-b07f-37b8f5939aff) by client crmd.4371
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: run_graph: Transition 37
>> (Complete=15, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-warn-15.bz2): Complete
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: do_state_transition: State
>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: do_state_transition: State
>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>> Jul  7 00:44:12 node2 pengine[4370]:   notice: unpack_config: On loss of
>> CCM Quorum: Ignore
>> Jul  7 00:44:12 node2 pengine[4370]:   notice: process_pe_message:
>> Calculated Transition 38: /var/lib/pacemaker/pengine/pe-input-154.bz2
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: do_te_invoke: Processing
>> graph 38 (ref=pe_calc-dc-1436211852-297) derived from
>> /var/lib/pacemaker/pengine/pe-input-154.bz2
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: run_graph: Transition 38
>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-154.bz2): Complete
>> Jul  7 00:44:12 node2 crmd[4371]:   notice: do_state_transition: State
>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: fence_node_time: Node
>> 739307907/node1 was last shot 'now'
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: check_fencing_done:
>> E4C27EC365B44D03B3629077F1B31722 check_fencing 739307907 done add
>> 1436211736 fail 1436211841 last 1436211853
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: check_fencing_done:
>> E4C27EC365B44D03B3629077F1B31722 check_fencing done
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: check_quorum_done:
>> E4C27EC365B44D03B3629077F1B31722 check_quorum disabled
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: check_fs_done:
>> E4C27EC365B44D03B3629077F1B31722 check_fs nodeid 739307907 clear
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: check_fs_done:
>> E4C27EC365B44D03B3629077F1B31722 check_fs done
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: send_info:
>> E4C27EC365B44D03B3629077F1B31722 send_start cg 2 flags 2 data2 0 counts
>> 1 1 0 1 1
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: receive_start:
>> E4C27EC365B44D03B3629077F1B31722 receive_start 739307908:2 len 76
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: match_change:
>> E4C27EC365B44D03B3629077F1B31722 match_change 739307908:2 matches cg 2
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: wait_messages_done:
>> E4C27EC365B44D03B3629077F1B31722 wait_messages cg 2 got all 1
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: start_kernel:
>> E4C27EC365B44D03B3629077F1B31722 start_kernel cg 2 member_count 1
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: update_dir_members: dir_member
>> 739307908
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: update_dir_members: dir_member
>> 739307907
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: set_configfs_members:
>> set_members rmdir
>> "/sys/kernel/config/dlm/cluster/spaces/E4C27EC365B44D03B3629077F1B31722/nodes/739307907"
>> Jul  7 00:44:13 node2 cluster-dlm[4866]: do_sysfs: write "1" to
>> "/sys/kernel/dlm/E4C27EC365B44D03B3629077F1B31722/control"
>> Jul  7 00:44:13 node2 kernel: [ 2374.125296]
>> (ocfs2rec,6017,0):ocfs2_replay_journal:1549 Recovering node 739307907
>> from slot 0 on device (253,0)
>> Jul  7 00:44:13 node2 kernel: [ 2374.429695]
>> (ocfs2rec,6017,0):ocfs2_begin_quota_recovery:407 Beginning quota
>> recovery in slot 0
>> Jul  7 00:44:13 node2 kernel: [ 2374.444718]
>> (kworker/u:28,139,0):ocfs2_finish_quota_recovery:599 Finishing quota
>> recovery in slot 0
>>
>> cib:
>> primitive p-dlm ocf:pacemaker:controld \
>>           op monitor interval="120" timeout="30" \
>>           op start interval="0" timeout="90" \
>>           op stop interval="0" timeout="100"
>> primitive p-o2cb ocf:ocfs2:o2cb \
>>           op monitor interval="120" timeout="30" \
>>           op start interval="0" timeout="90" \
>>           op stop interval="0" timeout="100"
>> primitive p-sapfs ocf:heartbeat:Filesystem \
>>           params device="/dev/vgsap/lvsap" directory="/sapdata"
>> fstype="ocfs2" \
>>           op monitor interval="60s" timeout="60s" \
>>           op start interval="0" timeout="90s" \
>>           op stop interval="0" timeout="90s"
>> primitive p-vgsap ocf:heartbeat:LVM \
>>           params volgrpname="vgsap" \
>>           op monitor interval="60s" timeout="40" \
>>           op start interval="0" timeout="40" \
>>           op stop interval="0" timeout="40"
>> primitive sbd_stonith stonith:external/sbd \
>>           meta target-role="Started" \
>>           op monitor interval="3000" timeout="20" \
>>           op start interval="0" timeout="20" \
>>           op stop interval="0" timeout="20" \
>>           params
>> sbd_device="/dev/disk/by-id/scsi-1494554000000000035c71550e83586b70dfc77c3b382dfbd"
>> group g-lock p-dlm p-o2cb
>> group g-sapdata p-vgsap p-sapfs
>> clone cl-lock g-lock \
>>           meta globally-unique="false" interleave="true"
>> clone cl-sapdata g-sapdata \
>>           meta globally-unique="false" interleave="true"
>> target-role="Started"
>> colocation col-lock-sapdata inf: cl-sapdata cl-lock
>> order ord-lock-sapdata inf: cl-lock cl-sapdata
>>
>>
>
--
Regards,

Muhammad Sharfuddin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150707/3b518a56/attachment.htm>


More information about the Users mailing list