[ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

Thu Mar 8 09:48:21 UTC 2018

Hi Feldhost,

I use active rrp_mode in corosync.conf and reboot the cluster to let the configuration effective.
But, the about 5 mins hang in new_lockspace() function is still here.

Thanks
Gang

>>> 
> Hi, so try to use active mode.
> 
> https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_installatio 
> n_terms.html
> 
> That fixes I saw in 4.14.*
> 
>> On 8 Mar 2018, at 09:12, Gang He <ghe at suse.com> wrote:
>> 
>> Hi Feldhost,
>> 
>> 
>>>>> 
>>> Hello Gang He,
>>> 
>>> which type of corosync rrp_mode you use? Passive or Active? 
>> clvm1:/etc/corosync # cat corosync.conf  | grep rrp_mode
>>        rrp_mode:       passive
>> 
>> Did you try test both?
>> No, only this mode. 
>> Also, what kernel version you use? I see some SCTP fixes in latest kernels.
>> clvm1:/etc/corosync # uname -r
>> 4.4.114-94.11-default
>> It looks that sock->ops->connect() function is blocked for too long time before 
> return, under broken network situation. 
>> In normal network, sock->ops->connect() function returns very quickly.
>> 
>> Thanks
>> Gang
>> 
>>> 
>>>> On 8 Mar 2018, at 08:52, Gang He <ghe at suse.com> wrote:
>>>> 
>>>> Hello list and David Teigland,
>>>> 
>>>> I got a problem under a two rings cluster, the problem can be reproduced 
>>> with the below steps.
>>>> 1) setup a two rings cluster with two nodes.
>>>> e.g. 
>>>> clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
>>>> clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103
>>>> 
>>>> 2) the whole cluster works well, then I put eth0 down on node clvm2, and 
>>> restart pacemaker service on that node.
>>>> ifconfig eth0 down
>>>> rcpacemaker restart
>>>> 
>>>> 3) the whole cluster still work well (that means corosync is very smooth to 
>>> switch to the other ring).
>>>> Then, I can mount ocfs2 file system on node clvm2 quickly with the command 
>>>> mount /dev/sda /mnt/ocfs2 
>>>> 
>>>> 4) Next, I do the same mount on node clvm1, the mount command will be hanged 
> 
>>> for about 5 mins, and finally the mount command is done.
>>>> But, if we setup a ocfs2 file system resource in pacemaker,
>>>> the pacemaker resource agent will consider ocfs2 file system resource 
>>> startup failure before this command returns,
>>>> the pacemaker will fence node clvm1. 
>>>> This problem is impacting our customer's estimate, since they think the two 
>>> rings can be switched smoothly.
>>>> 
>>>> According to this problem, I can see the mount command is hanged with the 
>>> below back trace,
>>>> clvm1:/ # cat /proc/6688/stack
>>>> [<ffffffffa04b8f2d>] new_lockspace+0x92d/0xa70 [dlm]
>>>> [<ffffffffa04b92d9>] dlm_new_lockspace+0x69/0x160 [dlm]
>>>> [<ffffffffa04db758>] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
>>>> [<ffffffffa0483872>] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
>>>> [<ffffffffa0577efc>] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
>>>> [<ffffffffa05c2983>] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
>>>> [<ffffffff8120e130>] mount_bdev+0x1a0/0x1e0
>>>> [<ffffffff8120ea1a>] mount_fs+0x3a/0x170
>>>> [<ffffffff81228bf2>] vfs_kern_mount+0x62/0x110
>>>> [<ffffffff8122b123>] do_mount+0x213/0xcd0
>>>> [<ffffffff8122bed5>] SyS_mount+0x85/0xd0
>>>> [<ffffffff81614b0a>] entry_SYSCALL_64_fastpath+0x1e/0xb6
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>> 
>>>> The root cause is in sctp_connect_to_sock() function in lowcomms.c,
>>>> 1075
>>>> 1076         log_print("connecting to %d", con->nodeid);
>>>> 1077
>>>> 1078         /* Turn off Nagle's algorithm */
>>>> 1079         kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *)&one,
>>>> 1080                           sizeof(one));
>>>> 1081
>>>> 1082         result = sock->ops->connect(sock, (struct sockaddr *)&daddr, 
>>> addr_len,
>>>> 1083                                    O_NONBLOCK);  <<= here, this invoking 
>>> will cost > 5 mins before return ETIMEDOUT(-110).
>>>> 1084         printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result);
>>>> 1085
>>>> 1086         if (result == -EINPROGRESS)
>>>> 1087                 result = 0;
>>>> 1088         if (result == 0)
>>>> 1089                 goto out;
>>>> 
>>>> Then, I want to know if this problem was found/fixed before? 
>>>> it looks DLM can not switch the second ring very quickly, this will impact 
>>> the above application (e.g. CLVM, ocfs2) to create a new lock space before 
>>> it's startup.
>>>> 
>>>> Thanks
>>>> Gang
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org 
>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>> Project Home: http://www.clusterlabs.org 
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>> Bugs: http://bugs.clusterlabs.org