[ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.

Adam Spiers aspiers at suse.com
Wed Feb 24 12:34:42 UTC 2016


Hi all,

Jan Friesse <jfriesse at redhat.com> wrote:
> >>>There is really no help. It's best to make sure corosync is scheduled
> >regularly.
> >I may sound silly, but how can I do it?
> 
> It's actually very hard to say. Pauses like 30 sec is really unusual
> and shouldn't happen (specially with RT scheduling). It is usually
> happening on VM where host is overcommitted.

It's funny you are discussing this during the same period where my
team is seeing this happen fairly regularly within VMs on a host which
is overcommitted.  In other words, I can confirm Jan's statement above
is true.

Like Konstiantyn, we have also sometimes seen no fencing occur as a
result of these pauses, e.g.

Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [MAIN  ] Corosync main process was not scheduled for 7343.1909 ms (threshold is 4000.0000 ms). Consider token timeout increase.
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [TOTEM ] A processor failed, forming new configuration.
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] CLM CONFIGURATION CHANGE
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] New Configuration:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] #011r(0) ip(192.168.2.82) 
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] #011r(0) ip(192.168.2.84) 
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] Members Left:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] Members Joined:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 32: memb=2, new=0, lost=0
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [pcmk  ] info: pcmk_peer_update: memb: d52-54-77-77-77-01 1084752466
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [pcmk  ] info: pcmk_peer_update: memb: d52-54-77-77-77-02 1084752468
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] CLM CONFIGURATION CHANGE
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] New Configuration:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] #011r(0) ip(192.168.2.82) 
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] #011r(0) ip(192.168.2.84) 
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] Members Left:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CLM   ] Members Joined:
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 32: memb=2, new=0, lost=0
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [pcmk  ] info: pcmk_peer_update: MEMB: d52-54-77-77-77-01 1084752466
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [pcmk  ] info: pcmk_peer_update: MEMB: d52-54-77-77-77-02 1084752468
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.2.82) ; members(old:2 left:0)
Feb 24 02:53:04 d52-54-77-77-77-02 corosync[18939]:   [MAIN  ] Completed service synchronization, ready to provide service.

I don't understand why it claims a processor failed, forming a new
configuration, when the configuration appears no different from
before: no members joined or left.  Can anyone explain this?




More information about the Users mailing list