[ClusterLabs] Antw: Re: Two node cluster goes into split brain scenario during CPU intensive tasks
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Jul 1 07:12:39 EDT 2019
>>> Somanath Jeeva <somanath.jeeva at ericsson.com> schrieb am 25.06.2019 um 13:06
in
Nachricht
<VI1PR07MB40454AFCC85D42E30627F8F7F9E30 at VI1PR07MB4045.eurprd07.prod.outlook.com>
> I have not configured fencing in our setup . However I would like to know if
> the split brain can be avoided when high CPU occurs.
It seems you like to ride a bicycle with crossed arms while trying to avoid to
fall ;-)
>
>
> With Regards
> Somanath Thilak J
>
> ‑‑‑‑‑Original Message‑‑‑‑‑
> From: Ken Gaillot <kgaillot at redhat.com>
> Sent: Monday, June 24, 2019 20:28
> To: Cluster Labs ‑ All topics related to open‑source clustering welcomed
> <users at clusterlabs.org>; Somanath Jeeva <somanath.jeeva at ericsson.com>
> Subject: Re: [ClusterLabs] Two node cluster goes into split brain scenario
> during CPU intensive tasks
>
> On Mon, 2019‑06‑24 at 08:52 +0200, Jan Friesse wrote:
>> Somanath,
>>
>> > Hi All,
>> >
>> > I have a two node cluster with multicast (udp) transport . The
>> > multicast IP used in 224.1.1.1 .
>>
>> Would you mind to give a try to UDPU (unicast)? For two node cluster
>> there is going to be no difference in terms of speed/throughput.
>>
>> >
>> > Whenever there is a CPU intensive task the pcs cluster goes into
>> > split brain scenario and doesn't recover automatically . We have to
>
> In addition to others' comments: if fencing is enabled, split brain should
> not be possible. Automatic recovery should work as long as fencing succeeds.
> With fencing disabled, split brain with no automatic recovery can definitely
> happen.
>
>> > do a manual restart of services to bring both nodes online again.
>>
>> Before the nodes goes into split brain , the corosync log shows ,
>> >
>> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List:
>> > 7c 7e
>> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List:
>> > 7c 7e
>> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List:
>> > 7c 7e
>> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List:
>> > 7c 7e
>> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List:
>> > 7c 7e
>>
>> This is usually happening when:
>> ‑ multicast is somehow rate‑limited on switch side (configuration/bad
>> switch implementation/...)
>> ‑ MTU of network is smaller than 1500 bytes and fragmentation is not
>> allowed ‑> try reduce totem.netmtu
>>
>> Regards,
>> Honza
>>
>>
>> > May 24 15:51:42 server1 corosync[4745]: [TOTEM ] A processor
>> > failed, forming new configuration.
>> > May 24 16:41:42 server1 corosync[4745]: [TOTEM ] A new membership
>> > (10.241.31.12:29276) was formed. Members left: 1 May 24 16:41:42
>> > server1 corosync[4745]: [TOTEM ] Failed to receive the leave
>> > message. failed: 1
>> >
>> > Is there any way we can overcome this or this may be due to any
>> > multicast issues in the network side.
>> >
>> > With Regards
>> > Somanath Thilak J
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Manage your subscription:
>> > https://protect2.fireeye.com/url?k=cf120bda‑9398df1b‑cf124b41‑863d9b
>> > cb726f‑716d821bbcb5bd46&q=1&u=https%3A%2F%2Flists.clusterlabs.org%2F
>> > mailman%2Flistinfo%2Fusers
>> >
>> > ClusterLabs home:
>> > https://protect2.fireeye.com/url?k=eb2ec5bb‑b7a4117a‑eb2e8520‑863d9b
>> > cb726f‑b47e1043056350cb&q=1&u=https%3A%2F%2Fwww.clusterlabs.org%2F
>> >
>>
>> _______________________________________________
>> Manage your subscription:
>> https://protect2.fireeye.com/url?k=99a652fd‑c52c863c‑99a61266‑863d9bcb
>> 726f‑72abff69ac96d9a3&q=1&u=https%3A%2F%2Flists.clusterlabs.org%2Fmail
>> man%2Flistinfo%2Fusers
>>
>> ClusterLabs home:
>> https://protect2.fireeye.com/url?k=d77f0141‑8bf5d580‑d77f41da‑863d9bcb
>> 726f‑0762985c29a467ea&q=1&u=https%3A%2F%2Fwww.clusterlabs.org%2F
> ‑‑
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list