[Pacemaker] Corosync service taking 100% cpu and is unable to stop gracefully

Dan Frincu df.cluster at gmail.com
Thu Apr 19 09:57:11 EDT 2012


On Thu, Apr 19, 2012 at 4:14 PM, Parshvi <parshvi.17 at gmail.com> wrote:
> Dan Frincu <df.cluster at ...> writes:
>
>>
>> Hi,
>>
>> On Thu, Apr 19, 2012 at 2:11 PM, Parshvi <parshvi.17 <at> gmail.com> wrote:
>> > Major issues:
>> > 1) Corosync reaching over 100% cpu usage.
>> > 2) Corosync unable to stop gracefully.
>> > 3) Virtual IP of a resources being assigned as the primary IP on a
> interface,
>> > after a cable disconnect/reconnect on that interface. The static IP on the
>> > interface shown as global secondary IP.
>> >
>> > Use case:
>> > 1) Two nodes in a cluster.
>> > 2) Two communication paths exists between the two nodes, with “rrp_mode” set
> to
>> > active in corosync.conf
>>
>> Are both links of the same speed?
> yes. speed of each: 1000Mb/s
>>
>> >  a. One path is a back-to-back connection between the nodes.
>> >  b. Second is  via the LAN network  switch.
>> > 3) The network cable was unplugged on one of the nodes for a while (on both
> the
>> > interfaces). It was reconnected after a short while.
>> >
>> > Observations:
>> > 1) Corosync service was taking 100% cpu on the node whose link was down:
>>
>> What version of Corosync? What OS?
> Corosync Cluster Engine, version '1.2.7' SVN revision '3008'
> OEL (Oracle Enterprise Linux release 5.6)

You need a newer version of Corosync. For redundant rings to work,
1.3.x or higher, for self healing redundant rings, 1.4.x.

>>
>
>> Can you pastebin.com your crm configure show?
> would do that in a followup mail.
>
> Thanks for a quick response Dan.
>
> Here is a snapshot of top:
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4726 root      RT   0  201m 5576 2004 R 100.4  0.1  36:35.31 corosync
>
> Logs and core file have been saved and can be posted if required.
> My response inline.
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE




More information about the Pacemaker mailing list