[ClusterLabs] Q: Corosync (totemrrp.c:961): (max - recv_count[i] > threshold

Fri May 15 13:50:12 EDT 2015

Hi,

On Wed, May 13, 2015 at 02:00:32PM +0200, Ulrich Windl wrote:
> Hi!
> 
> I have a simple question for those who know the answer ;-):
> 
> Can you phrase in English what the condition "(max - recv_count[i] > threshold" (in corosync-1.4.7/exec/totemrrp.c near line 961) is?

Looks like reaching some limit for the number of bad (or missed)
packets (my best guess :).

Thanks,

Dejan

> The condition triggers "Marking ringid %u interface %s FAULTY", and I can not find out why this condition is triggered periodically in our configuration.
> 
> (I tired to read and understand the code, but due to comment-free programming style I can't get it)
> 
> "threshold" is either rrp_instance->totem_config->rrp_problem_count_threshold or rrp_instance->totem_config->rrp_problem_count_mcast_threshold, and "max" either the maximum of token_recv_count or mcast_recv_count, sometimes normalized relative to the minimum recv_count. I guess the code in passive_monitor() assumes that all the counters are frozen while they are inspected multiple times.
> 
> Maybe it would be helpful to see max and recv_count[i], as well as threshold in the FAULTY message...
> 
> (We never see the other FAULTY message: "Marking seqid %d ringid %u interface %s FAULTY")
> 
> Regards,
> Ulrich
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org