[ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Sun Jun 12 05:30:02 EDT 2016


Hi All,

Our user constituted a cluster in corosync and Pacemaker in the next environment.
The cluster constituted it among guests.

* Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64
* libqb 0.17.1
* corosync 2.3.4
* Pacemaker 1.1.12

The cluster worked well.
When a user stopped an active guest, the next log was output in standby guests repeatedly.

May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5515870 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5515920 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5515971 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516021 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516071 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516121 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516171 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516221 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516271 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516322 ms, flushing membership messages.
May xx xx:25:53 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5516372 ms, flushing membership messages.
(snip)
May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5526172 ms, flushing membership messages.
May xx xx:26:03 standby-guest corosync[6311]:  [MAIN  ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
May xx xx:26:03 standby-guest corosync[6311]:  [TOTEM ] Process pause detected for 5526222 ms, flushing membership messages.
(snip)

As a result, the standby guest failed in the construction of the independent cluster.

It is recorded in log as if a timer stopped for 91 minutes.
It is abnormal length for 91 minutes.

Did you see a similar problem?

Possibly I think whether it is libqb or Kernel or some kind of problems.
* I suspect that the set of the timer failed in reset_pause_timeout().

Best Regards,
Hideo Yamauchi.





More information about the Users mailing list