[Pacemaker] Corosync service taking 100% cpu and is unable to stop gracefully

Dan Frincu df.cluster at gmail.com
Thu Apr 19 09:03:58 EDT 2012


Hi,

On Thu, Apr 19, 2012 at 2:11 PM, Parshvi <parshvi.17 at gmail.com> wrote:
> Major issues:
> 1) Corosync reaching over 100% cpu usage.
> 2) Corosync unable to stop gracefully.
> 3) Virtual IP of a resources being assigned as the primary IP on a interface,
> after a cable disconnect/reconnect on that interface. The static IP on the
> interface shown as global secondary IP.
>
> Use case:
> 1) Two nodes in a cluster.
> 2) Two communication paths exists between the two nodes, with “rrp_mode” set to
> active in corosync.conf

Are both links of the same speed?

>  a. One path is a back-to-back connection between the nodes.
>  b. Second is  via the LAN network  switch.
> 3) The network cable was unplugged on one of the nodes for a while (on both the
> interfaces). It was reconnected after a short while.
>
> Observations:
> 1) Corosync service was taking 100% cpu on the node whose link was down:

What version of Corosync? What OS?

>  a. In the above scenario Corosync service could not be stopped gracefully. A
> SIGKILL had to be issued to stop the service.
>  b. On this node, of the two interfaces configured in corosync.conf, one was
> being used for the Virtual IP’s preferred eth.
>    i. It was observed that when the link was up after a disconnection, the
> primary global IP on that interface was the Virtual IP configured for a
> resource.
>    ii. The static IP assigned to the interface was listed as “scope global
> secondary” in the output of `ip addr show`.
>    iii. Also the Virtual IP of the resources configured in pacemaker were
> active on both the nodes.

Can you pastebin.com your crm configure show?

>    iv. `service network restart` also did not work.
>  c. Coroysnc service was stopped (Killed since it could not be stopped), the
> network service was re-started and then corosync was re-started. All good after
> this.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE




More information about the Pacemaker mailing list