[ClusterLabs] maximum token value (knet)

Mon Mar 15 09:05:52 EDT 2021

On 3/13/21 12:55 AM, Strahil Nikolov wrote:
> I will try to get into the details on monday, when I have access to 
> the cluster again.
> I guess the /var/log/cluster/corosync.log and 
> /etc/corosync/corosync.conf are the most interesting.
>
> So far, I have 6 node cluster with separate VLANs for HANA 
> replication, prod and backup.
> Initially, I used pcs to create the corosync.conf with 2 IPs per node, 
> token 40000, consensus 48000 and wait_for_all=1.
> Later I have expanded the cluster to 3 links and added qnet to the 
> setup (only after I made it run (token 29000) ), so I'm ruling it out.
qdevice isn't using knet - right?
And VOTEQUORUM_QDEVICE_DEFAULT_SYNC_TIMEOUT is 30s. Unrelated coincidence?

Klaus
> I updated the cluster nodes from RHEL 8.1 to 8.2 , removed the 
> consensus and enabled debug.
>
> As knet is using udp by default, and because the problem is hitting me 
> both in udp (default settings) and sctp - the problem is not in the 
> protocol.
>
> I've also enabled pacemaker blackbox, although I doubt that has any 
> effect on corosync.
>
> How can I enable trace logs for corosync only ?
>
> Best Regards,
> Strahil Nikolov
>
>
>
>     On Fri, Mar 12, 2021 at 17:01, Jan Friesse
>     <jfriesse at redhat.com> wrote:
>     Strahil,
>
>     > Interesting...
>     > Yet, this doesn't explain why token of 30000 causes the nodes to
>     never assemble a cluster (waiting for half an hour, using
>     wait_for_all=1) , while setting it to 29000 works like a charm.
>
>     Definitively.
>
>     Could you please provide a bit more info about your setup
>     (config/logs/how many nodes cluster has/...)? Because I've just
>     briefly
>     tested two nodes setup with 30 sec token timeout and it was working
>     perfectly fine.
>
>     >
>     > Thankfully we got RH subsciption, so RH devs will provide more
>     detailed output on the issue.
>
>     As Jehan correctly noted if it would really get to RH devs it would
>     probably get to me ;) But before that GSS will take care of checking
>     configs/hw/logs/... and they are really good in finding problems with
>     setup/hw/...
>
>     >
>     > I was hoping that I missed in the documentation about the
>     maximum token size...
>
>     Nope.
>
>     No matter what, if you can send config/logs/... we may try to find
>     out
>     what is root of the problem here on ML or you can really try GSS,
>     but as
>     Jehan told, it would be nice if you can post result so other
>     people (me
>     included) knows what was the main problem.
>
>     Thanks and regards,
>       Honza
>
>
>     >
>     > Best Regards,
>     > Strahil Nikolov
>     >
>     >
>     >
>     >
>     >
>     >
>     > В четвъртък, 11 март 2021 г., 19:12:58 ч. Гринуич+2, Jan Friesse
>     <jfriesse at redhat.com <mailto:jfriesse at redhat.com>> написа:
>     >
>     >
>     >
>     >
>     >
>     > Strahil,
>     >> Hello all,
>     >> I'm building a test cluster on RHEL8.2 and I have noticed that
>     the cluster fails to assemble ( nodes stay inquorate as if the
>     network is not working) if I set the token at 30000 or more (30s+).
>     >
>     > Knet waits for enough pong replies for other nodes before it
>     marks them
>     > as alive and starts sending/receiving packets from them. By
>     default it
>     > needs to receive 2 pongs and ping is sent 4 times in token
>     timeout so it
>     > means 15 sec until node is considered up for 30 sec token timeout.
>     >
>     >> What is the maximum token value with knet ?On SLES12 (I think
>     it was  corosync 1) , I used to set the token/consensus with far
>     greater values on some of our clusters.
>     >
>     > I'm really not aware about any arbitrary limits.
>     >
>     >
>     >> Best Regards,Strahil Nikolov
>     >>
>     >
>     > Regards,
>     >
>     >    Honza
>     >
>     >>
>     >>
>     >> _______________________________________________
>     >> Manage your subscription:
>     >> https://lists.clusterlabs.org/mailman/listinfo/users
>     <https://lists.clusterlabs.org/mailman/listinfo/users>
>     >>
>     >> ClusterLabs home: https://www.clusterlabs.org/
>     <https://www.clusterlabs.org/>
>     >
>     >>
>     >
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/