[ClusterLabs] Antw: Re: A processor failed, forming new configuration very often and without reason

Philippe Carbonnier philippe.carbonnier at vif.fr
Wed Apr 29 08:27:30 EDT 2015


Thanks for your answers.
Token value was previoulsly 5000, but I already increased it to 10000,
without any change. So 10 secondes before TOTEM fire the "A processor
failed, forming new configuration" message, but in the log we see that in
the same second the other node reappeared !
Should I use an higher token value ?

Best regards,

2015-04-29 14:17 GMT+02:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:

> >>> Jan Friesse <jfriesse at redhat.com> schrieb am 29.04.2015 um 13:10 in
> Nachricht
> <5540BC0B.50409 at redhat.com>:
> > Philippe,
> >
> > Philippe Carbonnier napsal(a):
> >> Hello,
> >> just for the guys who doesn't want to read all the logs, I put my
> question
> >> on top (and at the end) of the post :
> >> Is there a timer that I can raise to try to give more time to each
> nodes to
> >> see each other BEFORE TOTEM fire the "A processor failed, forming new
> >> configuration", because the 2 nodes are really up and running.
> >
> > There are many timers, but basically almost everything depends on token
> > timeout, so just set "token" to higher value.
>
> Please correct me if I'm wrong: A token timeout is oly triggered when
> 1) The token is lost in the network (i.e. a packet is lost and not
> retransmitted in time)
> 2) The token is lost on a node (e.g. it crashes while it has the token)
> 3) The host or the network don't respond in time (the token is not lost,
> but late)
> 4) There's a major bug in the TOTEM protocol (its implementation)
>
> I really wonder whether the resaon for frequent token timeouts is 1);
> usually it's not 2) either. For me 3) is hard to believe also. And nobody
> admits it's 4).
>
> So everybody says it's 3) and suggests to increase the timeout.
>
> >>
> >> The 2 linux servers (vif5_7 and host2.example.com) are 2 VM on the same
> >> VMWare ESX server. May be the network is 'not working' the way corosync
> >> wants ?
>
> OK, for virtual hosts I might add:
> 5) The virtual time is not flowing steadily, i.e. the number of usable CPU
> cycles per walltime unit is highly variable.
>
> >
> > Yep. But first give a chance to token timeout increase.
>
> I agree that for 5) a longer token timeout might be a workaround, but
> finding the root cause may be worth the time being spent doing so.
>
>
> Regards,
> Ulrich
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
[image: logoVif] <http://www.vif.fr/>L'informatique 100% Agrowww.vif.fr 
[image: VifYouTube] <http://www.youtube.com/user/Agrovif>[image: VifTwitter] 
<https://twitter.com/VIF_agro>*Suivez l'actualité VIF sur:* 
<http://www.agrovif.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150429/bfaec4d1/attachment-0003.html>


More information about the Users mailing list