[ClusterLabs] Two node cluster goes into split brain scenario during CPU intensive tasks

Somanath Jeeva somanath.jeeva at ericsson.com
Tue Jun 25 07:03:59 EDT 2019


We have User jobs running at the time the split brain scenario occurs. The CPU load at that time is around 55 (We have 32 CPU cores). Is there any way we can avoid the split brain scenario in this case.


With Regards
Somanath Thilak J

From: Emmanuel Gelati <emi2fast at gmail.com>
Sent: Monday, June 24, 2019 01:57
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: Re: [ClusterLabs] Two node cluster goes into split brain scenario during CPU intensive tasks

Hi,

Please, specify which you running and and check the cpu usage of your system, if you we are talking about user usage or system usage.

Best regards

Il giorno dom 23 giu 2019 alle ore 13:40 Somanath Jeeva <somanath.jeeva at ericsson.com<mailto:somanath.jeeva at ericsson.com>> ha scritto:
Hi All,

I have a two node cluster with multicast (udp) transport . The multicast IP used in 224.1.1.1 .

Whenever there is a CPU intensive task the pcs cluster goes into split brain scenario and doesn’t recover automatically . We have to do a manual restart of services to bring both nodes online again. Before the nodes goes into split brain , the corosync log shows ,

May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit List: 7c 7e
May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit List: 7c 7e
May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit List: 7c 7e
May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit List: 7c 7e
May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit List: 7c 7e
May 24 15:51:42 server1 corosync[4745]:  [TOTEM ] A processor failed, forming new configuration.
May 24 16:41:42 server1 corosync[4745]:  [TOTEM ] A new membership (10.241.31.12:29276<http://10.241.31.12:29276>) was formed. Members left: 1
May 24 16:41:42 server1 corosync[4745]:  [TOTEM ] Failed to receive the leave message. failed: 1

Is there any way we can overcome this or this may be due to any multicast issues in the network side.

With Regards
Somanath Thilak J




_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


--
  .~.
  /V\
 //  \\
/(   )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190625/c1141cd6/attachment-0001.html>


More information about the Users mailing list