[ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Mon Jun 13 09:51:52 UTC 2016


Hi Honza,

Thank you for comment.


>>  Our user constituted a cluster in corosync and Pacemaker in the next 
> environment.
>>  The cluster constituted it among guests.
>> 
>>  * Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
>>  * libqb 0.17.1
>>  * corosync 2.3.4
>>  * Pacemaker 1.1.12
>> 
>>  The cluster worked well.
>>  When a user stopped an active guest, the next log was output in standby 
> guests repeatedly.
> 
> What exactly you mean by "active guest" and "standby 
> guests"?

The cluster is active / standby constitution.

As for the standby guest, a wait is in a state until a resource breaks down in active guests.


When a resource was replaced by standby, this problem seemed to occur.


> 
>> 
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515870 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515920 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5515971 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516021 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516071 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516121 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516171 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516221 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516271 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516322 ms, flushing membership messages.
>>  May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5516372 ms, flushing membership messages.
>>  (snip)
>>  May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5526172 ms, flushing membership messages.
>>  May xx xx:26:03 standby-guest corosync[6311]:  [MAIN  ] Totem is unable to 
> form a cluster because of an operating system or network fault. The most common 
> cause of this message is that the local firewall is configured improperly.
>>  May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause 
> detected for 5526222 ms, flushing membership messages.
>>  (snip)
>> 
> 
> This is weird. Not because of enormous pause length but because corosync 
> has a "scheduler pause" detector which warns before "Process 
> pause 
> detected ..." error is logged.

I thought so, too.
However, "scheduler pause" does not seem to be taking place.

> 
>>  As a result, the standby guest failed in the construction of the 
> independent cluster.
>> 
>>  It is recorded in log as if a timer stopped for 91 minutes.
>>  It is abnormal length for 91 minutes.
>> 
>>  Did you see a similar problem?
> 
> Never

Okay!


> 
>> 
>>  Possibly I think whether it is libqb or Kernel or some kind of problems.
> 
> What virtualization technology are you using? KVM?
> 
>>  * I suspect that the set of the timer failed in reset_pause_timeout().
> 
> You can try to put asserts into this function, but there is really not 
> too much reasons why it should fail (ether malloc returns NULL or some 
> nasty memory corruption).


I read a source code, too.
However, it is the street of your opinion.

I do not know whether a problem reappears, but I constitute it in RHEL6.6 and intend to take load this week.

If any you have noticed, please give me an email.

Best Regards,
Hideo Yamauchi.





More information about the Users mailing list