[ClusterLabs] Linux 8.2 - high totem token requires manual setting of ping_interval and ping_timeout

Hayden,Robert RHAYDEN at CERNER.COM
Fri Jun 26 11:25:42 EDT 2020


Robert Hayden | Lead Technology Architect | Cerner Corporation
> -----Original Message-----
> From: Hayden,Robert
> Sent: Friday, June 26, 2020 8:50 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
> Subject: RE: [ClusterLabs] Linux 8.2 - high totem token requires manual
> setting of ping_interval and ping_timeout
>
> Attached is a pcs cluster report after I destroyed the existing cluster and
> rebooted the nodes.  Totem here was set to 41000.  Hopefully, email
> scanners will not remove the attachment.
>

In case the email scanners stripped off the file, please see the tar ball at
https://github.com/roberthayden/public-ftp/raw/master/pcs_report_totem41.tar.bz2


> Thanks!
> Robert
>
>
> Robert Hayden | Lead Technology Architect | Cerner Corporation
>
> > -----Original Message-----
> > From: Users <users-bounces at clusterlabs.org> On Behalf Of Christine
> > Caulfield
> > Sent: Friday, June 26, 2020 2:16 AM
> > To: users at clusterlabs.org
> > Subject: Re: [ClusterLabs] Linux 8.2 - high totem token requires manual
> > setting of ping_interval and ping_timeout
> >
> > On 26/06/2020 07:56, Jan Friesse wrote:
> > > Robert,
> > > thank you for the info/report. More comments inside.
> > >
> > >> All,
> > >> Hello.  Hope all is well.   I have been researching Oracle Linux 8.2
> > >> and ran across a situation that is not well documented.   I decided to
> > >> provide some details to the community in case I am missing something.
> > >>
> > >> Basically, if you increase the totem token above approximately 33000
> > >> with the knet transport, then a two node cluster will not properly
> > >> form.   The exact threshold value will slightly fluctuate, depending
> > >> on hardware type and debugging, but will consistently fail above 40000.
> > >
> > > At least corosync with 40sec timeout works just fine for me.
> > >
> >
> >
> > I just tried 41 second token timeout on a 2-node and a 4-node cluster
> > (pcs/corosync/pacemaker) and it started up just fine. I think we'd need
> > to see the logs.
> >
> >
> > > # corosync-cmapctl  | grep token
> > > runtime.config.totem.token (u32) = 40650
> > >
> > > # corosync-quorumtool
> > > Quorum information
> > > ------------------
> > > Date:             Fri Jun 26 08:45:12 2020
> > > Quorum provider:  corosync_votequorum
> > > Nodes:            2
> > > Node ID:          1
> > > Ring ID:          1.11be1
> > > Quorate:          Yes
> > >
> > > Votequorum information
> > > ----------------------
> > > Expected votes:   3
> > > Highest expected: 3
> > > Total votes:      2
> > > Quorum:           2
> > > Flags:            Quorate
> > >
> > > Membership information
> > > ----------------------
> > >     Nodeid      Votes Name
> > >          1          1 vmvlan-vmcos8-n05 (local)
> > >          6          1 vmvlan-vmcos8-n06
> > >
> > >
> > > It is indeed true that forming took a bit more time (30 sec to be more
> > > precise)
> > >
> > >>
> > >> The failure to form a cluster would occur when running the "pcs
> > >> cluster start --all" command or if I would start one cluster, let it
> > >> stabilize, then start the second.  When it fails to form a cluster,
> > >> each side would say they are ONLINE, but the other side is
> > >> UNCLEAN(offline) (cluster state: partition WITHOUT quorum).   If I
> > >> define proper stonith resources, then they will not fence since the
> > >> cluster never makes it to an initial quorum state.  So, the cluster
> > >> will stay in this split state indefinitely.
> > >
> > > Maybe some timeout in pcs?
> > >
> > >>
> > >> Changing the transport back to udpu or udp, the higher totem tokens
> > >> worked as expected.
> > >
> > > Yup. You've correctly find out that knet_* timeouts helps. Basically
> > > knet let link not working till it gets enough pongs. UDP/UDPU doesn't
> > > have this concept so it will create cluster faster.
> > >
> > >>
> > >>  From the debug logging, I suspect that the Election Trigger (20
> > >> seconds) fires before all nodes are properly identified by the knet
> > >> transport.  I noticed that with a totem token passing 32 seconds, the
> > >> knet_ping* defaults were pushing up against that 20 second mark.  The
> > >> output of "corosync-cfgtool -s" will show each node's link as enabled,
> > >> but each side will state the other side's link is not connected.
> > >> Since each side thinks the other node is not active, they fail to
> > >> properly send a join message to the other node during the election.
> > >> They will essentially form a singleton cluster(??).
> > >
> > > Till now your analysis is correct. Corosync is really unable to send
> > > join message and forms single node cluster.
> > >
> > >> It is more puzzling when you start one node at a time, waiting for the
> > >> node to stabilize before starting the other.   It is like the first
> > >> node will never see the remote knet interfaces become active,
> > >> regardless of how long you wait.
> > >
> > > This shouldn't happen. Knet will eventually receive enough pongs so
> > > corosync broadcast message to other nodes, which founds out that new
> > > membership should be formed.
> > >
> > >>
> > >> The solution is to manually set the knet ping_timeout and
> > >> ping_interval to lower values than the default values derived from the
> > >> totem token.  This seems to allow for the knet transport to determine
> > >> link status of all nodes before the election timer pops.
> > >
> > > These timeouts are indeed not the best one. I had few ideas how to
> > > improve them, because currently they are in favor of multiple links
> > > clusters. Single links cluster may work better with slightly different
> > > defaults.
> > >
> > >>
> > >> I tested this on both physical hardware and with VMs.  Both react
> > >> similarly.
> > >>
> > >> Bare bones test case to reproduce:
> > >> yum install pcs pacemaker fence-agents-all
> > >> firewall-cmd --permanent --add-service=high-availability
> > >> firewall-cmd --add-service=high-availability
> > >> systemctl start pcsd.service
> > >> systemctl enable pcsd.service
> > >> systemctl disable corosync
> > >> systemctl disable pacemaker
> > >> passwd hacluster
> > >> pcs host auth node1 node2
> > >> pcs cluster setup rhcs_test node1 node2 totem token=41000
> > >> pcs cluster start --all
> > >>
> > >> Example command to create cluster that will properly form and get
> > quorum:
> > >> pcs cluster setup rhcs_test node1 node2 totem token=61000 transport
> > >> knet link ping_interval=1250 ping_timeout=2500
> > >>
> > >> Hope this helps someone in the future.
> > >
> > > Yup. It is interesting finding and thanks for that.
> > >
> > > Regards,
> > >   Honza
> > >
> > >>
> > >> Thanks
> > >> Robert
> > >>
> > >>
> > >> Robert Hayden | Lead Technology Architect | Cerner Corporation
> > >>
> > >>


CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.


More information about the Users mailing list