[ClusterLabs] Antw: Re: [Cluster-devel] DLM connection channel switch take too long time (> 5mins)
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Mar 8 06:24:34 EST 2018
Hi!
What surprises me most is that a connect(...O_NONBLOCK) actually blocks:
EINPROGRESS
The socket is non-blocking and the connection cannot be com-
pleted immediately.
Regards,
Ulrich
>>> "Gang He" <ghe at suse.com> schrieb am 08.03.2018 um 10:48 in Nachricht
<5AA17765020000F9000ADCF2 at prv-mh.provo.novell.com>:
> Hi Feldhost,
>
> I use active rrp_mode in corosync.conf and reboot the cluster to let the
> configuration effective.
> But, the about 5 mins hang in new_lockspace() function is still here.
>
> Thanks
> Gang
>
>
>>>>
>> Hi, so try to use active mode.
>>
>> https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_installatio
>
>> n_terms.html
>>
>> That fixes I saw in 4.14.*
>>
>>> On 8 Mar 2018, at 09:12, Gang He <ghe at suse.com> wrote:
>>>
>>> Hi Feldhost,
>>>
>>>
>>>>>>
>>>> Hello Gang He,
>>>>
>>>> which type of corosync rrp_mode you use? Passive or Active?
>>> clvm1:/etc/corosync # cat corosync.conf | grep rrp_mode
>>> rrp_mode: passive
>>>
>>> Did you try test both?
>>> No, only this mode.
>>> Also, what kernel version you use? I see some SCTP fixes in latest kernels.
>>> clvm1:/etc/corosync # uname -r
>>> 4.4.114-94.11-default
>>> It looks that sock->ops->connect() function is blocked for too long time before
>> return, under broken network situation.
>>> In normal network, sock->ops->connect() function returns very quickly.
>>>
>>> Thanks
>>> Gang
>>>
>>>>
>>>>> On 8 Mar 2018, at 08:52, Gang He <ghe at suse.com> wrote:
>>>>>
>>>>> Hello list and David Teigland,
>>>>>
>>>>> I got a problem under a two rings cluster, the problem can be reproduced
>>>> with the below steps.
>>>>> 1) setup a two rings cluster with two nodes.
>>>>> e.g.
>>>>> clvm1(nodeid 172204569) addr_list eth0 10.67.162.25 eth1 192.168.152.240
>>>>> clvm2(nodeid 172204570) addr_list eth0 10.67.162.26 eth1 192.168.152.103
>>>>>
>>>>> 2) the whole cluster works well, then I put eth0 down on node clvm2, and
>>>> restart pacemaker service on that node.
>>>>> ifconfig eth0 down
>>>>> rcpacemaker restart
>>>>>
>>>>> 3) the whole cluster still work well (that means corosync is very smooth to
>>>> switch to the other ring).
>>>>> Then, I can mount ocfs2 file system on node clvm2 quickly with the command
>>>>> mount /dev/sda /mnt/ocfs2
>>>>>
>>>>> 4) Next, I do the same mount on node clvm1, the mount command will be hanged
>
>>
>>>> for about 5 mins, and finally the mount command is done.
>>>>> But, if we setup a ocfs2 file system resource in pacemaker,
>>>>> the pacemaker resource agent will consider ocfs2 file system resource
>>>> startup failure before this command returns,
>>>>> the pacemaker will fence node clvm1.
>>>>> This problem is impacting our customer's estimate, since they think the two
>>>> rings can be switched smoothly.
>>>>>
>>>>> According to this problem, I can see the mount command is hanged with the
>>>> below back trace,
>>>>> clvm1:/ # cat /proc/6688/stack
>>>>> [<ffffffffa04b8f2d>] new_lockspace+0x92d/0xa70 [dlm]
>>>>> [<ffffffffa04b92d9>] dlm_new_lockspace+0x69/0x160 [dlm]
>>>>> [<ffffffffa04db758>] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
>>>>> [<ffffffffa0483872>] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
>>>>> [<ffffffffa0577efc>] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
>>>>> [<ffffffffa05c2983>] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
>>>>> [<ffffffff8120e130>] mount_bdev+0x1a0/0x1e0
>>>>> [<ffffffff8120ea1a>] mount_fs+0x3a/0x170
>>>>> [<ffffffff81228bf2>] vfs_kern_mount+0x62/0x110
>>>>> [<ffffffff8122b123>] do_mount+0x213/0xcd0
>>>>> [<ffffffff8122bed5>] SyS_mount+0x85/0xd0
>>>>> [<ffffffff81614b0a>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>
>>>>> The root cause is in sctp_connect_to_sock() function in lowcomms.c,
>>>>> 1075
>>>>> 1076 log_print("connecting to %d", con->nodeid);
>>>>> 1077
>>>>> 1078 /* Turn off Nagle's algorithm */
>>>>> 1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *)&one,
>>>>> 1080 sizeof(one));
>>>>> 1081
>>>>> 1082 result = sock->ops->connect(sock, (struct sockaddr *)&daddr,
>>>> addr_len,
>>>>> 1083 O_NONBLOCK); <<= here, this invoking
>>>> will cost > 5 mins before return ETIMEDOUT(-110).
>>>>> 1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result);
>>>>> 1085
>>>>> 1086 if (result == -EINPROGRESS)
>>>>> 1087 result = 0;
>>>>> 1088 if (result == 0)
>>>>> 1089 goto out;
>>>>>
>>>>> Then, I want to know if this problem was found/fixed before?
>>>>> it looks DLM can not switch the second ring very quickly, this will impact
>>>> the above application (e.g. CLVM, ocfs2) to create a new lock space before
>>>> it's startup.
>>>>>
>>>>> Thanks
>>>>> Gang
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list