[ClusterLabs] Linux 8.2 - high totem token requires manual setting of ping_interval and ping_timeout

Christine Caulfield ccaulfie at redhat.com
Fri Jun 26 03:16:06 EDT 2020


On 26/06/2020 07:56, Jan Friesse wrote:
> Robert,
> thank you for the info/report. More comments inside.
> 
>> All,
>> Hello.  Hope all is well.   I have been researching Oracle Linux 8.2
>> and ran across a situation that is not well documented.   I decided to
>> provide some details to the community in case I am missing something.
>>
>> Basically, if you increase the totem token above approximately 33000
>> with the knet transport, then a two node cluster will not properly
>> form.   The exact threshold value will slightly fluctuate, depending
>> on hardware type and debugging, but will consistently fail above 40000.
> 
> At least corosync with 40sec timeout works just fine for me.
> 


I just tried 41 second token timeout on a 2-node and a 4-node cluster
(pcs/corosync/pacemaker) and it started up just fine. I think we'd need
to see the logs.


> # corosync-cmapctl  | grep token
> runtime.config.totem.token (u32) = 40650
> 
> # corosync-quorumtool
> Quorum information
> ------------------
> Date:             Fri Jun 26 08:45:12 2020
> Quorum provider:  corosync_votequorum
> Nodes:            2
> Node ID:          1
> Ring ID:          1.11be1
> Quorate:          Yes
> 
> Votequorum information
> ----------------------
> Expected votes:   3
> Highest expected: 3
> Total votes:      2
> Quorum:           2
> Flags:            Quorate
> 
> Membership information
> ----------------------
>     Nodeid      Votes Name
>          1          1 vmvlan-vmcos8-n05 (local)
>          6          1 vmvlan-vmcos8-n06
> 
> 
> It is indeed true that forming took a bit more time (30 sec to be more
> precise)
> 
>>
>> The failure to form a cluster would occur when running the "pcs
>> cluster start --all" command or if I would start one cluster, let it
>> stabilize, then start the second.  When it fails to form a cluster,
>> each side would say they are ONLINE, but the other side is
>> UNCLEAN(offline) (cluster state: partition WITHOUT quorum).   If I
>> define proper stonith resources, then they will not fence since the
>> cluster never makes it to an initial quorum state.  So, the cluster
>> will stay in this split state indefinitely.
> 
> Maybe some timeout in pcs?
> 
>>
>> Changing the transport back to udpu or udp, the higher totem tokens
>> worked as expected.
> 
> Yup. You've correctly find out that knet_* timeouts helps. Basically
> knet let link not working till it gets enough pongs. UDP/UDPU doesn't
> have this concept so it will create cluster faster.
> 
>>
>>  From the debug logging, I suspect that the Election Trigger (20
>> seconds) fires before all nodes are properly identified by the knet
>> transport.  I noticed that with a totem token passing 32 seconds, the
>> knet_ping* defaults were pushing up against that 20 second mark.  The
>> output of "corosync-cfgtool -s" will show each node's link as enabled,
>> but each side will state the other side's link is not connected.  
>> Since each side thinks the other node is not active, they fail to
>> properly send a join message to the other node during the election.  
>> They will essentially form a singleton cluster(??).  
> 
> Till now your analysis is correct. Corosync is really unable to send
> join message and forms single node cluster.
> 
>> It is more puzzling when you start one node at a time, waiting for the
>> node to stabilize before starting the other.   It is like the first
>> node will never see the remote knet interfaces become active,
>> regardless of how long you wait.
> 
> This shouldn't happen. Knet will eventually receive enough pongs so
> corosync broadcast message to other nodes, which founds out that new
> membership should be formed.
> 
>>
>> The solution is to manually set the knet ping_timeout and
>> ping_interval to lower values than the default values derived from the
>> totem token.  This seems to allow for the knet transport to determine
>> link status of all nodes before the election timer pops.
> 
> These timeouts are indeed not the best one. I had few ideas how to
> improve them, because currently they are in favor of multiple links
> clusters. Single links cluster may work better with slightly different
> defaults.
> 
>>
>> I tested this on both physical hardware and with VMs.  Both react
>> similarly.
>>
>> Bare bones test case to reproduce:
>> yum install pcs pacemaker fence-agents-all
>> firewall-cmd --permanent --add-service=high-availability
>> firewall-cmd --add-service=high-availability
>> systemctl start pcsd.service
>> systemctl enable pcsd.service
>> systemctl disable corosync
>> systemctl disable pacemaker
>> passwd hacluster
>> pcs host auth node1 node2
>> pcs cluster setup rhcs_test node1 node2 totem token=41000
>> pcs cluster start --all
>>
>> Example command to create cluster that will properly form and get quorum:
>> pcs cluster setup rhcs_test node1 node2 totem token=61000 transport
>> knet link ping_interval=1250 ping_timeout=2500
>>
>> Hope this helps someone in the future.
> 
> Yup. It is interesting finding and thanks for that.
> 
> Regards,
>   Honza
> 
>>
>> Thanks
>> Robert
>>
>>
>> Robert Hayden | Lead Technology Architect | Cerner Corporation
>>
>>
>> CONFIDENTIALITY NOTICE This message and any included attachments are
>> from Cerner Corporation and are intended only for the addressee. The
>> information contained in this message is confidential and may
>> constitute inside or non-public information under international,
>> federal, or state securities laws. Unauthorized forwarding, printing,
>> copying, distribution, or use of such information is strictly
>> prohibited and may be unlawful. If you are not the addressee, please
>> promptly delete this message and notify the sender of the delivery
>> error by e-mail or you may call Cerner's corporate offices in Kansas
>> City, Missouri, U.S.A at (+1) (816)221-1024.
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list