[ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

Jan Friesse jfriesse at redhat.com
Mon Jun 13 02:54:37 EDT 2016


Hideo,

> Hi All,
>
> Our user constituted a cluster in corosync and Pacemaker in the next environment.
> The cluster constituted it among guests.
>
> * Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
> * libqb 0.17.1
> * corosync 2.3.4
> * Pacemaker 1.1.12
>
> The cluster worked well.
> When a user stopped an active guest, the next log was output in standby guests repeatedly.

What exactly you mean by "active guest" and "standby guests"?

>
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5515870 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5515920 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5515971 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516021 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516071 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516121 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516171 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516221 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516271 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516322 ms, flushing membership messages.
> May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516372 ms, flushing membership messages.
> (snip)
> May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5526172 ms, flushing membership messages.
> May xx xx:26:03 standby-guest corosync[6311]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
> May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5526222 ms, flushing membership messages.
> (snip)
>

This is weird. Not because of enormous pause length but because corosync 
has a "scheduler pause" detector which warns before "Process pause 
detected ..." error is logged.

> As a result, the standby guest failed in the construction of the independent cluster.
>
> It is recorded in log as if a timer stopped for 91 minutes.
> It is abnormal length for 91 minutes.
>
> Did you see a similar problem?

Never

>
> Possibly I think whether it is libqb or Kernel or some kind of problems.

What virtualization technology are you using? KVM?

> * I suspect that the set of the timer failed in reset_pause_timeout().

You can try to put asserts into this function, but there is really not 
too much reasons why it should fail (ether malloc returns NULL or some 
nasty memory corruption).

Regards,
   Honza

>
> Best Regards,
> Hideo Yamauchi.
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>





More information about the Users mailing list