[ClusterLabs] recommendations for corosync totem timeout for CentOS 7 + VMware?
jfriesse at redhat.com
Mon Mar 25 03:50:22 EDT 2019
> On Fri, Mar 22, 2019 at 08:57:20AM +0100, Jan Friesse wrote:
>>> - If I manually set 'totem.token' to a higher value, am I responsible
>>> for tracking the number of nodes in the cluster, to keep in
>>> alignment with what Red Hat's page says?
>> Nope. I've tried to explain what is really happening in the manpage
>> corosync.conf(5). totem.token and totem.token_coefficient are used in
>> the following formula:
> I do see this under token_coefficient, thanks.
>> Corosync used runtime.config.token.
> Cool; thanks. Bumping up totem.token to 2000 got me over this hump.
>>> - Under these conditions, when corosync exits, why does it do so
>>> with a zero status? It seems to me that if it exited at all,
>> That's a good question. How reproducible is the issue? Corosync
>> shouldn't "exit" with zero status.
> If I leave totem.token set to default, %100 in my case.
> I stand corrected; yesterday, it was %100. Today, I cannot reproduce
> this at all, even with reverting to the defaults.
> Here is a snippet of output from yesterday's experiments; this is
> based on a typescript capture file, so I apologize for the ANSI
> screen codes.
Yep, np. Looks just fine.
> - by default, systemd doesn't report full log lines.
> - by default, CentOS's config of systemd doesn't persist journaled
> logs, so I can't directly review yesterday's efforts.
> - and, it looks like I misinterpreted the 'exited' message; corosync
> was enabled and running, but the 'Process' line doesn't report
> on the 'corosync' process, but some systemd utility.
> (Let me count the ways I'm coming to dislike systemd...)
> I was able to recover logs from /var/log/messages, but other than
> the 'Consider token timeout increase' message, it looks hunky-dory.
> With what I've since learned;
> - I cannot explain why I can't reproduce the symptoms, even with
> reverting to the defaults.
> - And without being able to reproduce, I can't pursue why 'pcs
> status cluster' was actually failing for me. :/
> So, I appreciate your attention to this message, and I guess I'm
> off to further explore all of this.
> C]0;root at node1:~^G[root at node1 ~]# systemctl status corosync.service
> ESC[1;32m●ESC[0m corosync.service - Corosync Cluster Engine
> Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor
> preset: disabled)
> Active: ESC[1;32mactive (running)ESC[0m since Thu 2019-03-21 14:26:56
> UTC; 1min 35s ago
> Docs: man:corosync
> Process: 5474 ExecStart=/usr/share/corosync/corosync start (code=exited,
> Main PID: 5490 (corosync)
> CGroup: /system.slice/corosync.service
> └─5490 corosync
As you can see, corosync service unit in COS 7 is executing init script
which execs corosync and waits till connection to local IPC can be
established. IPC connection can be established when corosync is ready.
Initscript timeout for IPC is 1 minute and return code is 1 if
connection cannot be established. On success initscript returns 0. So
ExecStart (initscript) exited with 0/SUCESS = corosync was successfully
started and it is running as a PID 5490.
More information about the Users