[ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?

Fri Mar 22 15:41:47 EDT 2019

On Fri, Mar 22, 2019 at 08:57:20AM +0100, Jan Friesse wrote:
> >- If I manually set 'totem.token' to a higher value, am I responsible
> >   for tracking the number of nodes in the cluster, to keep in
> >   alignment with what Red Hat's page says?
> 
> Nope. I've tried to explain what is really happening in the manpage 
> corosync.conf(5). totem.token and totem.token_coefficient are used in 
> the following formula:

I do see this under token_coefficient, thanks.

> Corosync used runtime.config.token.

Cool; thanks.  Bumping up totem.token to 2000 got me over this hump.

> >- Under these conditions, when corosync exits, why does it do so
> >   with a zero status? It seems to me that if it exited at all,
> 
> That's a good question. How reproducible is the issue? Corosync 
> shouldn't "exit" with zero status.

If I leave totem.token set to default, %100 in my case.

I stand corrected; yesterday, it was %100.  Today, I cannot reproduce
this at all, even with reverting to the defaults.

Here is a snippet of output from yesterday's experiments; this is
based on a typescript capture file, so I apologize for the ANSI
screen codes.

- by default, systemd doesn't report full log lines.

- by default, CentOS's config of systemd doesn't persist journaled
  logs, so I can't directly review yesterday's efforts.

- and, it looks like I misinterpreted the 'exited' message; corosync
  was enabled and running, but the 'Process' line doesn't report
  on the 'corosync' process, but some systemd utility.

(Let me count the ways I'm coming to dislike systemd...)

I was able to recover logs from /var/log/messages, but other than
the 'Consider token timeout increase' message, it looks hunky-dory.

With what I've since learned; 

- I cannot explain why I can't reproduce the symptoms, even with
  reverting to the defaults.

- And without being able to reproduce, I can't pursue why 'pcs
  status cluster' was actually failing for me. :/

So, I appreciate your attention to this message, and I guess I'm
off to further explore all of this.

  C]0;root at node1:~^G[root at node1 ~]# systemctl status corosync.service
  ESC[1;32mâ—ESC[0m corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor
preset: disabled)
     Active: ESC[1;32mactive (running)ESC[0m since Thu 2019-03-21 14:26:56
UTC; 1min 35s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
    Process: 5474 ExecStart=/usr/share/corosync/corosync start (code=exited,
status=0/SUCCESS)
   Main PID: 5490 (corosync)
     CGroup: /system.slice/corosync.service
           â””â”€5490 corosync

>   Honza

-- 
Brian Reichert				<reichert at numachi.com>
BSD admin/developer at large