[ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

Fri Mar 9 10:41:38 EST 2018

On 2018-03-09 01:32 AM, Gang He wrote:
> Hello Digimer,
> 
> 
> 
>>>>
>> On 2018-03-08 12:10 PM, David Teigland wrote:
>>>> I use active rrp_mode in corosync.conf and reboot the cluster to let the 
>> configuration effective.
>>>> But, the about 5 mins hang in new_lockspace() function is still here.
>>>
>>> The last time I tested connection failures with sctp was several years
>>> ago, but I recall seeing similar problems.  I had hoped that some of the
>>> sctp changes might have helped, but perhaps they didn't.
>>> Dave
>>
>> To add to this; We found serious issues with DLM over sctp/rrp. Our
>> solution was to remove RRP and reply on active/passive (mode=1) bonding.
>> I do not believe you can make anything using DLM reliable on RRP in
>> either active or passive mode.
> Do you have the detailed steps to describe this workaround? 
> My means is, how to remove RRP? and reply on active/passive (mode=1) bonding?
> From the code, we have to use sctp protocol in DLM on a two-rings cluster.
> 
> Thanks
> Gang

I'm using RHEL 6, so for me, disabling rrp was simply removing the rrp
attribute and the <altname> child elements. As for bonding, here's how I
did it;

https://www.alteeve.com/w/AN!Cluster_Tutorial_2#Configuring_our_Bridge.2C_Bonds_and_Interfaces

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould