[ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

Fri Mar 9 01:32:51 EST 2018

Hello Digimer,

>>> 
> On 2018-03-08 12:10 PM, David Teigland wrote:
>>> I use active rrp_mode in corosync.conf and reboot the cluster to let the 
> configuration effective.
>>> But, the about 5 mins hang in new_lockspace() function is still here.
>> 
>> The last time I tested connection failures with sctp was several years
>> ago, but I recall seeing similar problems.  I had hoped that some of the
>> sctp changes might have helped, but perhaps they didn't.
>> Dave
> 
> To add to this; We found serious issues with DLM over sctp/rrp. Our
> solution was to remove RRP and reply on active/passive (mode=1) bonding.
> I do not believe you can make anything using DLM reliable on RRP in
> either active or passive mode.
Do you have the detailed steps to describe this workaround? 
My means is, how to remove RRP? and reply on active/passive (mode=1) bonding?
>From the code, we have to use sctp protocol in DLM on a two-rings cluster.

Thanks
Gang

> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/ 
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould