[ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?
reichert at numachi.com
Fri Mar 22 15:41:47 EDT 2019
On Fri, Mar 22, 2019 at 08:57:20AM +0100, Jan Friesse wrote:
> >- If I manually set 'totem.token' to a higher value, am I responsible
> > for tracking the number of nodes in the cluster, to keep in
> > alignment with what Red Hat's page says?
> Nope. I've tried to explain what is really happening in the manpage
> corosync.conf(5). totem.token and totem.token_coefficient are used in
> the following formula:
I do see this under token_coefficient, thanks.
> Corosync used runtime.config.token.
Cool; thanks. Bumping up totem.token to 2000 got me over this hump.
> >- Under these conditions, when corosync exits, why does it do so
> > with a zero status? It seems to me that if it exited at all,
> That's a good question. How reproducible is the issue? Corosync
> shouldn't "exit" with zero status.
If I leave totem.token set to default, %100 in my case.
I stand corrected; yesterday, it was %100. Today, I cannot reproduce
this at all, even with reverting to the defaults.
Here is a snippet of output from yesterday's experiments; this is
based on a typescript capture file, so I apologize for the ANSI
- by default, systemd doesn't report full log lines.
- by default, CentOS's config of systemd doesn't persist journaled
logs, so I can't directly review yesterday's efforts.
- and, it looks like I misinterpreted the 'exited' message; corosync
was enabled and running, but the 'Process' line doesn't report
on the 'corosync' process, but some systemd utility.
(Let me count the ways I'm coming to dislike systemd...)
I was able to recover logs from /var/log/messages, but other than
the 'Consider token timeout increase' message, it looks hunky-dory.
With what I've since learned;
- I cannot explain why I can't reproduce the symptoms, even with
reverting to the defaults.
- And without being able to reproduce, I can't pursue why 'pcs
status cluster' was actually failing for me. :/
So, I appreciate your attention to this message, and I guess I'm
off to further explore all of this.
C]0;root at node1:~^G[root at node1 ~]# systemctl status corosync.service
ESC[1;32mâESC[0m corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor
Active: ESC[1;32mactive (running)ESC[0m since Thu 2019-03-21 14:26:56
UTC; 1min 35s ago
Process: 5474 ExecStart=/usr/share/corosync/corosync start (code=exited,
Main PID: 5490 (corosync)
Brian Reichert <reichert at numachi.com>
BSD admin/developer at large
More information about the Users