[ClusterLabs] Antw: Re: Antw: Re: Spurious node loss in corosync cluster

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Aug 23 07:32:44 EDT 2018


>>> Prasad Nagaraj <prasad.nagaraj76 at gmail.com> schrieb am 22.08.2018 um 19:00 in
Nachricht
<CAHbCUJ0k0M=MhEqj=9BBOM14+jtu0fQX_V8b4HJNdw2VUMQP=g at mail.gmail.com>:
> Hi - My systems are single core cpu VMs running on azure platform. I am

OK, so you don't have any control over overprovisioning CPU power and the VM being migrated between nodes, I guess. Be aware that the CPU time you are seeing is purely virtual, and there may be times when a "100% busy CPU" gets no CPU cycles at all. An interesting experiment would be to compare the CLOCK_MONOTONIC values against CLOCK_REALTIME (ona real hiost, so that CLOCK_REALTIME actually is no virtual time) over some time. I wouldn't be suprised if you see jumps.

I think clouds are no god for realtime demands.

Regards,
Ulrich

> running MySQL on the nodes that do generate high io load. And my bad , I
> meant to say 'High CPU load detected' logged by crmd and not corosync.
> Corosync logs messages like 'Corosync main process was not scheduled
> for.....' kind of messages which inturn makes pacemaker monitor action to
> fail sometimes. Is increasing token timeout a solution for this or any
> other ways ?
> 
> Thanks for the help
> Prasaf
> 
> On Wed, 22 Aug 2018, 11:55 am Jan Friesse, <jfriesse at redhat.com> wrote:
> 
>> Prasad,
>>
>> > Thanks Ken and Ulrich. There is definitely high IO on the system with
>> > sometimes IOWAIT s of upto 90%
>> > I have come across some previous posts that IOWAIT is also considered as
>> > CPU load by Corosync. Is this true ? Does having high IO may lead
>> corosync
>> > complain as in " Corosync main process was not scheduled for..." or "Hi
>> > CPU load detected.." ?
>>
>> Yes it can.
>>
>> Corosync never logs "Hi CPU load detected...".
>>
>> >
>> > I will surely monitor the system more.
>>
>> Is that system VM or physical machine? Because " Corosync main process
>> was not scheduled for..." is usually happening on VMs where hosts are
>> highly overloaded.
>>
>> Honza
>>
>> >
>> > Thanks for your help.
>> > Prasad
>> >
>> >
>> >
>> > On Tue, Aug 21, 2018 at 9:07 PM, Ken Gaillot <kgaillot at redhat.com>
>> wrote:
>> >
>> >> On Tue, 2018-08-21 at 15:29 +0200, Ulrich Windl wrote:
>> >>>>>> Prasad Nagaraj <prasad.nagaraj76 at gmail.com> schrieb am
>> >>>>>> 21.08.2018 um 11:42 in
>> >>>
>> >>> Nachricht
>> >>> <CAHbCUJ0zdvpYALCR7tbnGgb8qrZHh8uDjE+RsnkoewvmFb8wAg at mail.gmail.com>:
>> >>>> Hi Ken - Thanks for you response.
>> >>>>
>> >>>> We do have seen messages in other cases like
>> >>>> corosync [MAIN  ] Corosync main process was not scheduled for
>> >>>> 17314.4746 ms
>> >>>> (threshold is 8000.0000 ms). Consider token timeout increase.
>> >>>> corosync [TOTEM ] A processor failed, forming new configuration.
>> >>>>
>> >>>> Is this the indication of a failure due to CPU load issues and will
>> >>>> this
>> >>>> get resolved if I upgrade to Corosync 2.x series ?
>> >>
>> >> Yes, most definitely this is a CPU issue. It means corosync isn't
>> >> getting enough CPU cycles to handle the cluster token before the
>> >> timeout is reached.
>> >>
>> >> Upgrading may indeed help, as recent versions ensure that corosync runs
>> >> with real-time priority in the kernel, and thus are more likely to get
>> >> CPU time when something of lower priority is consuming all the CPU.
>> >>
>> >> But of course, there is some underlying problem that should be
>> >> identified and addressed. Figure out what's maxing out the CPU or I/O.
>> >> Ulrich's monitoring suggestion is a good start.
>> >>
>> >>> Hi!
>> >>>
>> >>> I'd strongly recommend starting monitoring on your nodes, at least
>> >>> until you know what's going on. The good old UNIX sa (sysstat
>> >>> package) could be a starting point. I'd monitor CPU idle
>> >>> specifically. Then go for 100% device utilization, then look for
>> >>> network bottlenecks...
>> >>>
>> >>> A new corosync release cannot fix those, most likely.
>> >>>
>> >>> Regards,
>> >>> Ulrich
>> >>>
>> >>>>
>> >>>> In any case, for the current scenario, we did not see any
>> >>>> scheduling
>> >>>> related messages.
>> >>>>
>> >>>> Thanks for your help.
>> >>>> Prasad
>> >>>>
>> >>>> On Mon, Aug 20, 2018 at 7:57 PM, Ken Gaillot <kgaillot at redhat.com>
>> >>>> wrote:
>> >>>>
>> >>>>> On Sun, 2018-08-19 at 17:35 +0530, Prasad Nagaraj wrote:
>> >>>>>> Hi:
>> >>>>>>
>> >>>>>> One of these days, I saw a spurious node loss on my 3-node
>> >>>>>> corosync
>> >>>>>> cluster with following logged in the corosync.log of one of the
>> >>>>>> nodes.
>> >>>>>>
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] notice: pcmk_peer_update:
>> >>>>>> Transitional membership event on ring 32: memb=2, new=0, lost=1
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info: pcmk_peer_update: memb:
>> >>>>>> vm02d780875f 67114156
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info: pcmk_peer_update: memb:
>> >>>>>> vmfa2757171f 151000236
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info: pcmk_peer_update: lost:
>> >>>>>> vm728316982d 201331884
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] notice: pcmk_peer_update:
>> >>>>>> Stable
>> >>>>>> membership event on ring 32: memb=2, new=0, lost=0
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
>> >>>>>> vm02d780875f 67114156
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
>> >>>>>> vmfa2757171f 151000236
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info:
>> >>>>>> ais_mark_unseen_peer_dead:
>> >>>>>> Node vm728316982d was not seen in the previous transition
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info: update_member: Node
>> >>>>>> 201331884/vm728316982d is now: lost
>> >>>>>> Aug 18 12:40:25 corosync [pcmk  ] info:
>> >>>>>> send_member_notification:
>> >>>>>> Sending membership update 32 to 3 children
>> >>>>>> Aug 18 12:40:25 corosync [TOTEM ] A processor joined or left
>> >>>>>> the
>> >>>>>> membership and a new membership was formed.
>> >>>>>> Aug 18 12:40:25 [4544] vmfa2757171f stonith-ng:     info:
>> >>>>>> plugin_handle_membership:     Membership 32: quorum retained
>> >>>>>> Aug 18 12:40:25 [4544] vmfa2757171f stonith-ng:   notice:
>> >>>>>> crm_update_peer_state_iter:   plugin_handle_membership: Node
>> >>>>>> vm728316982d[201331884] - state is now lost (was member)
>> >>>>>> Aug 18 12:40:25 [4548] vmfa2757171f       crmd:     info:
>> >>>>>> plugin_handle_membership:     Membership 32: quorum retained
>> >>>>>> Aug 18 12:40:25 [4548] vmfa2757171f       crmd:   notice:
>> >>>>>> crm_update_peer_state_iter:   plugin_handle_membership: Node
>> >>>>>> vm728316982d[201331884] - state is now lost (was member)
>> >>>>>> Aug 18 12:40:25 [4548] vmfa2757171f       crmd:     info:
>> >>>>>> peer_update_callback: vm728316982d is now lost (was member)
>> >>>>>> Aug 18 12:40:25 [4548] vmfa2757171f       crmd:  warning:
>> >>>>>> match_down_event:     No match for shutdown action on
>> >>>>>> vm728316982d
>> >>>>>> Aug 18 12:40:25 [4548] vmfa2757171f       crmd:   notice:
>> >>>>>> peer_update_callback: Stonith/shutdown of vm728316982d not
>> >>>>>> matched
>> >>>>>> Aug 18 12:40:25 [4548] vmfa2757171f       crmd:     info:
>> >>>>>> crm_update_peer_join: peer_update_callback: Node
>> >>>>>> vm728316982d[201331884] - join-6 phase 4 -> 0
>> >>>>>> Aug 18 12:40:25 [4548] vmfa2757171f       crmd:     info:
>> >>>>>> abort_transition_graph:       Transition aborted: Node failure
>> >>>>>> (source=peer_update_callback:240, 1)
>> >>>>>> Aug 18 12:40:25 [4543] vmfa2757171f        cib:     info:
>> >>>>>> plugin_handle_membership:     Membership 32: quorum retained
>> >>>>>> Aug 18 12:40:25 [4543] vmfa2757171f        cib:   notice:
>> >>>>>> crm_update_peer_state_iter:   plugin_handle_membership: Node
>> >>>>>> vm728316982d[201331884] - state is now lost (was member)
>> >>>>>> Aug 18 12:40:25 [4543] vmfa2757171f        cib:   notice:
>> >>>>>> crm_reap_dead_member: Removing vm728316982d/201331884 from the
>> >>>>>> membership list
>> >>>>>> Aug 18 12:40:25 [4543] vmfa2757171f        cib:   notice:
>> >>>>>> reap_crm_member:      Purged 1 peers with id=201331884 and/or
>> >>>>>> uname=vm728316982d from the membership cache
>> >>>>>> Aug 18 12:40:25 [4544] vmfa2757171f stonith-ng:   notice:
>> >>>>>> crm_reap_dead_member: Removing vm728316982d/201331884 from the
>> >>>>>> membership list
>> >>>>>> Aug 18 12:40:25 [4544] vmfa2757171f stonith-ng:   notice:
>> >>>>>> reap_crm_member:      Purged 1 peers with id=201331884 and/or
>> >>>>>> uname=vm728316982d from the membership cache
>> >>>>>>
>> >>>>>> However, within seconds, the node was able to join back.
>> >>>>>>
>> >>>>>> Aug 18 12:40:34 corosync [pcmk  ] notice: pcmk_peer_update:
>> >>>>>> Stable
>> >>>>>> membership event on ring 36: memb=3, new=1, lost=0
>> >>>>>> Aug 18 12:40:34 corosync [pcmk  ] info: update_member: Node
>> >>>>>> 201331884/vm728316982d is now: member
>> >>>>>> Aug 18 12:40:34 corosync [pcmk  ] info: pcmk_peer_update: NEW:
>> >>>>>> vm728316982d 201331884
>> >>>>>>
>> >>>>>>
>> >>>>>> But this was enough time for the cluster to get into split
>> >>>>>> brain kind
>> >>>>>> of situation with  a resource on the node vm728316982d being
>> >>>>>> stopped
>> >>>>>> because of this node loss detection.
>> >>>>>>
>> >>>>>> Could anyone help whether this could happen due to any
>> >>>>>> transient
>> >>>>>> network distortion or so ?
>> >>>>>> Are there any configuration settings that can be applied in
>> >>>>>> corosync.conf so that cluster is more resilient to such
>> >>>>>> temporary
>> >>>>>> distortions.
>> >>>>>
>> >>>>> Your corosync sensitivity of 10-second token timeout and 10
>> >>>>> retransimissions is already very lengthy -- likely the node was
>> >>>>> already
>> >>>>> unresponsive for more than 10 seconds before the first message
>> >>>>> above,
>> >>>>> so it was more than 18 seconds before it rejoined.
>> >>>>>
>> >>>>> It's rarely a good idea to change
>> >>>>> token_retransmits_before_loss_const;
>> >>>>> changing token is generally enough to deal with transient network
>> >>>>> unreliability. However 18 seconds is a really long time to raise
>> >>>>> the
>> >>>>> token to, and it's uncertain from the information here whether
>> >>>>> the root
>> >>>>> cause was networking or something on the host.
>> >>>>>
>> >>>>> I notice your configuration is corosync 1 with the pacemaker
>> >>>>> plugin;
>> >>>>> that is a long-deprecated setup, and corosync 3 is about to come
>> >>>>> out,
>> >>>>> so you may want to consider upgrading to at least corosync 2 and
>> >>>>> a
>> >>>>> reasonably recent pacemaker. That would give you some reliability
>> >>>>> improvements, including real-time priority scheduling of
>> >>>>> corosync,
>> >>>>> which could have been the issue here if CPU load rather than
>> >>>>> networking
>> >>>>> was the root cause.
>> >>>>>
>> >>>>>>
>> >>>>>> Currently my corosync.conf looks like this :
>> >>>>>>
>> >>>>>> compatibility: whitetank
>> >>>>>> totem {
>> >>>>>>      version: 2
>> >>>>>>      secauth: on
>> >>>>>>      threads: 0
>> >>>>>>      interface {
>> >>>>>>      member {
>> >>>>>>              memberaddr: 172.20.0.4
>> >>>>>>          }
>> >>>>>> member {
>> >>>>>>              memberaddr: 172.20.0.9
>> >>>>>>          }
>> >>>>>> member {
>> >>>>>>              memberaddr: 172.20.0.12
>> >>>>>>          }
>> >>>>>>
>> >>>>>>      bindnetaddr: 172.20.0.12
>> >>>>>>
>> >>>>>>      ringnumber: 0
>> >>>>>>      mcastport: 5405
>> >>>>>>      ttl: 1
>> >>>>>>      }
>> >>>>>>      transport: udpu
>> >>>>>>      token: 10000
>> >>>>>>      token_retransmits_before_loss_const: 10
>> >>>>>> }
>> >>>>>>
>> >>>>>> logging {
>> >>>>>>      fileline: off
>> >>>>>>      to_stderr: yes
>> >>>>>>      to_logfile: yes
>> >>>>>>      to_syslog: no
>> >>>>>>      logfile: /var/log/cluster/corosync.log
>> >>>>>>      timestamp: on
>> >>>>>>      logger_subsys {
>> >>>>>>      subsys: AMF
>> >>>>>>      debug: off
>> >>>>>>      }
>> >>>>>> }
>> >>>>>> service {
>> >>>>>>      name: pacemaker
>> >>>>>>      ver: 1
>> >>>>>> }
>> >>>>>> amf {
>> >>>>>>      mode: disabled
>> >>>>>> }
>> >>>>>>
>> >>>>>> Thanks in advance for the help.
>> >>>>>> Prasad
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> Users mailing list: Users at clusterlabs.org 
>> >>>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> >>>>>>
>> >>>>>> Project Home: http://www.clusterlabs.org 
>> >>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Sc 
>> >>>>>> ratch.
>> >>>>>> pdf
>> >>>>>> Bugs: http://bugs.clusterlabs.org 
>> >>>>>
>> >>>>> --
>> >>>>> Ken Gaillot <kgaillot at redhat.com>
>> >>>>> _______________________________________________
>> >>>>> Users mailing list: Users at clusterlabs.org 
>> >>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> >>>>>
>> >>>>> Project Home: http://www.clusterlabs.org 
>> >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scra 
>> >>>>> tch.pdf
>> >>>>> Bugs: http://bugs.clusterlabs.org 
>> >>>>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Users mailing list: Users at clusterlabs.org 
>> >>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> >>>
>> >>> Project Home: http://www.clusterlabs.org 
>> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
>> >>> pdf
>> >>> Bugs: http://bugs.clusterlabs.org 
>> >> --
>> >> Ken Gaillot <kgaillot at redhat.com>
>> >> _______________________________________________
>> >> Users mailing list: Users at clusterlabs.org 
>> >> https://lists.clusterlabs.org/mailman/listinfo/users 
>> >>
>> >> Project Home: http://www.clusterlabs.org 
>> >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> >> Bugs: http://bugs.clusterlabs.org 
>> >>
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list: Users at clusterlabs.org 
>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > Project Home: http://www.clusterlabs.org 
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> > Bugs: http://bugs.clusterlabs.org 
>> >
>>
>>






More information about the Users mailing list