[ClusterLabs] service network restart and corosync

Wed Mar 30 05:23:26 EDT 2016

> Hi Jan Friesse,
>
> Thank you for the update.
>
> I have responded inline.
> A few queries, as well.
>
> Regards,
> Debabrata Pani
>
> On 29/03/16 14:24, "Jan Friesse" <jfriesse at redhat.com> wrote:
>
>>
>>> Hi (Jan Friesse)
>>>
>>> I studied the issue mentioned in the github url.
>>> It looks the crash that I am talking about is slightly different from
>>> the
>>> one mentioned in the original issue. May be they are related, but I
>>> would
>>> like to
>>> Highlight my setup for ease.
>>>
>>> Three node cluster , one is in maintenance mode to prevent any
>>> scheduling
>>> of resources.
>>> =====
>>> Stack: classic openais (with plugin)
>>
>> ^^ I'm pretty sure you don't want to use plugin based pcmk.
>
> Corosync version:
> Corosync Cluster Engine, version '1.4.7'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
>
> IS there any (non-plugin) version available for cents 6.5 ? What  is the

AFAIK (I'm not pcmk expert) CentOS contains both plugin and cman non 
plugin version. Also 6.5 is is not supported any longer so it's good 
idea to upgrade.

> setup that is recommended ?
> I am slightly puzzled by CMAN/plugin thing so some pointers will be really
> helpful.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/index.html

>
>>
>>> Current DC: vm2cen66.mobileum.com - partition with quorum
>>> Version: 1.1.11-97629de
>>> 3 Nodes configured, 3 expected votes
>>> 6 Resources configured
>>>
>>>
>>> Node vm3cent66.mobileum.com: maintenance
>>> Online: [ vm1cen66.mobileum.com vm2cen66.mobileum.com ]
>>> ====
>>>
>>> I login to vm1cen66 and do `ifdown eth0`
>>> In vm1cen66, I don¹t see any change in the crm_mon -Afr output.
>>> It remains the same, as shown below
>>> ====
>>> Stack: classic openais (with plugin)
>>> Current DC: vm2cen66.mobileum.com - partition with quorum
>>> Version: 1.1.11-97629de
>>> 3 Nodes configured, 3 expected votes
>>> 6 Resources configured
>>>
>>>
>>> Node vm3cent66.mobileum.com: maintenance
>>> Online: [ vm1cen66.mobileum.com vm2cen66.mobileum.com ]
>>> ===
>>>
>>>
>>> But if we login to the other nodes like vm2cen66, vem3cent66, we can
>>> correctly see that the node vm1cen66 is offline.
>>
>> That is expected
>>
>>>
>>>
>>> But if we look into the corosync.log of vm1cen66 we see the following
>>>
>>> ===
>>> Mar 28 14:55:09 corosync [MAIN  ] Totem is unable to form a cluster
>>> because of an operating system or network fault. The most common cause
>>> of
>>> this message is that the local firewall is configured improperly.
>>> pgsql(TestPostgresql)[28203]:   2016/03/28_14:55:10 INFO: Master does
>>> not
>>> exist.
>>> pgsql(TestPostgresql)[28203]:   2016/03/28_14:55:10 WARNING: My data is
>>> out-of-date. status=DISCONNECT
>>> Mar 28 14:55:11 corosync [MAIN  ] Totem is unable to form a cluster
>>> because of an operating system or network fault. The most common cause
>>> of
>>> this message is that the local firewall is configured improperly.
>>> Mar 28 14:55:12 corosync [MAIN  ] Totem is unable to form a cluster
>>> because of an operating system or network fault. The most common cause
>>> of
>>> this message is that the local firewall is configured improperly.
>>> ======
>>>
>>
>> This is result of ifdown. Just don't do that.
>>
>> What exact version of corosync are you using?
>
>
> Corosync version:
> Corosync Cluster Engine, version '1.4.7'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
>
>>
>>>
>>> Pgsql resource (the postgresql resource agent) is running on this
>>> particular node . I did a pgrep of the process and found it running. Not
>>> attaching the logs for now.
>>>
>>> The ³crash² happens when the ethernet interface is brought up. vm1cen66
>>> is
>>> unable to reconnect to the cluster because corosync has crashed, taking
>>> some processes of pacemaker along with it.
>>> crm_mon too stops working (it was working previously, before putting the
>>> interface up)
>>>
>>>
>>> I have to restart the corosync and pacemaker services to make it work
>>> again.
>>
>> That's why I keep saying don't do ifdown.
>>
>>>
>>>
>>> The main observation is that the node where the ethernet interface is
>>> down, does not really ³get² it. It assumes that the other nodes are
>>> still
>>> online, although the logs do say that the interface is down.
>>>
>>> Queries/Observations:
>>> 1- node vm1cen66 should realise that the other nodes are offline
>>
>> That would be correct behavior, yes.
>
> Does this mean that it should also realise that it does not have the
> quorum and act according to the no-quorum-policy ?
> Since we don’t have stonith hardware agents, this would be really useful

Are you sure that your server doesn't have IPMI?

> for us. Do we have some other way of handling this ?

Not right now.

>
>>
>>> 2- From the discussion in the github issue it seems that in case of
>>> ethernet failure we want it to run as a single node setup. Is that so ?
>>
>> Not exactly. It should behave like all other nodes gone down.
>>
>>> 	2a. If that is the case will it honour no-quorum-policy=ignore and stop
>>> processes ?
>>> 	2b. Or will it assume that it is a single node cluster and decided
>>> accordingly ?
>>> 3- After doing an interface down, if we grep for the corosync port in
>>> the
>>> netstat command , we see that the corosync process has now bound the
>>> loopback interface. Previously it was bound to the ip on eth0.
>>> 	Is this expected ? As per the discussion it should be so. But the crash
>>> did not happen immediately. It crashes when we bring the ethernet
>>> interface up.
>>
>> This is expected.
>>
>>> 	If the corosync did crash, why were we observing the logs in
>>> corosync.log
>>> 4- Is it possible to prevent the corosync crash that we witnessed when
>>> the
>>> ethernet interface is brought up.
>>
>> Nope. Just don't do ifdown.
>>
>>> 5- Will preventing the corosync crash really matter ? Because the node
>>> vm1cen66 has now converted into a single node cluster ? Or will it
>>> automatically re-bind to eth0 when interface is brought up
>>> 	(Could not verify because of the crash)
>>
>> It's rebound to eth0, send wrong information to other nodes and totally
>> destroy membership. Again, just don't do ifdown.
>
> What incorrect membership information will it send ?
> Is it a problem if another node joins the cluster with exactly opposite
> information about availability of nodes ?
> Or are we talking about something else ?

It is simply bug in corosync. It's long time known and it's really not 
that easy to fix it properly.

>>
>>> 6- What about the split brain situation due to pacemaker not shutting
>>> down
>>> the services on that single node ?
>>> 	In a master-slave configuration this causes some confusion as to which
>>> instance should be made a master after the node joins back.
>>> 	As per the suggestion from the group , we need to configure stonith for
>>> it. Configuring stonith seems to be the topmost priority in pacemaker
>>> clusters.
>>
>> It's not exactly topmost priority, but it's easy way how to solve many
>> problems.
>>
>>> 	But as far as I gather, we need specialised hardware for this ?
>>
>> I believe there were also SW based stonith agents (eventho not that
>> reliable so not exactly recommended). Also most of the servers have at
>> least IPMI.
>>
>> And last recommendation. Don't do ifdown.
>
> Yes that is what we plan to recommend. But these are customer's machines.
> They always reboot their network after making some changes to the static
> routing table. Don’t have much control over this behaviour , as of now.

I'm not really sure if routing table changes really need full network 
restart.

Anyway. I'm unsure how to help you. Maybe you can give a try to stop 
corosync before network restart and then start it again.

Regards,
   Honza

>>
>> Regards,
>>    Honza
>>
>>>
>>> Regards,
>>> Debabrata Pani
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 03/03/16 13:46, "Jan Friesse" <jfriesse at redhat.com> wrote:
>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> In our deployment, due to some requirement, we need to do a :
>>>>> service network restart
>>>>
>>>> What is exact reason for doing network restart?
>>>>
>>>>>
>>>>> Due to this corosync crashes and the associated pacemaker processes
>>>>> crash
>>>>> as well.
>>>>>
>>>>> As per the last comment on this issue,
>>>>> -------
>>>>> Corosync reacts oddly to that. It's better to use an iptables rule to
>>>>> block traffic (or crash the node with something like 'echo c >
>>>>> /proc/sysrq-trigge
>>>>> --------
>>>>>
>>>>>
>>>>>
>>>>> But other network services, like Postgres, do not crash due to this
>>>>> network service restart :
>>>>> 	I can login to psql , issue queries, without any problem.
>>>>>
>>>>> In view of this, I would like to understand if it is possible to
>>>>> prevent a
>>>>> corosync (and a corresponding Pacemaker) crash ?
>>>>> Since postgres is somehow surviving this restart.
>>>>>
>>>>> Any pointer to socket-level details for this behaviour will help me
>>>>> understand (and explain the stakeholders) the problems better.
>>>>
>>>> https://github.com/corosync/corosync/pull/32 should help.
>>>>
>>>> Regards,
>>>>     Honza
>>>>
>>>>>
>>>>> Regards,
>>>>> Debabrata Pani
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>