[ClusterLabs] Is "Process pause detected" triggered too easily?
Jean-Marc Saffroy
saffroy at gmail.com
Tue Oct 3 15:57:43 EDT 2017
Hi Jan,
On Tue, 3 Oct 2017, Jan Friesse wrote:
> > I hope this makes sense! :)
>
> I would still have some questions :) but that is really not related to
> the problem you have.
Questions are welcome! I am new to this stack, so there is certainly room
for learning and for improvement.
> My personal favorite is consensus timeout. Because you've set (and I
> must say according to doc correctly) consensus timeout to 3600 (= 1.2 *
> token). Problem is, that result token timeout is not 3000, but with 5
> nodes it is actually 3000 (base token) + (no_nodes - 2) * 650 ms = 4950
> (as you can check by observing runtime.config.totem.token key). So it
> may make sense to set consensus timeout to ~6000.
Could you clarify the formula for me? I don't see how "- 2" and "650" map
to this configuration.
And I suppose that on our bigger system (20+5 servers) we need to greatly
increase the consensus timeout.
Overall, tuning the timeouts seems related to be Black Magic. ;) I liked
the idea suggested in an old thread that there would be a spreadsheet (or
even just plain formulas) exposing the relation between the various knobs.
One thing I wonder is: would it make sense to annotate the state machine
diagram in the Totem paper (page 15 of
http://www.cs.jhu.edu/~yairamir/tocs.ps.gz) with those tunables? Assuming
the paper still reflects the behavior of the current code.
> This doesn't change the fact that "bug" is reproducible even with
> "correct" consensus, so I will continue working on this issue.
Great! Thanks for taking the time to investigate.
Cheers,
JM
--
saffroy at gmail.com
More information about the Users
mailing list