[Pacemaker] lrmd: WARN: G_SIG_dispatch: Dispatch function for S 1000 ms (> 100 ms) before being called

Fri Apr 1 15:21:05 EDT 2011

Hi,

* Dejan Muhamedagic <dejanmm at fastmail.fm> [20110401 09:41]:
> Hi,
> 
> On Thu, Mar 31, 2011 at 10:52:39AM +0200, Jelle de Jong wrote:
> > On 30-03-11 21:05, Jean-Francois Malouin wrote:
> > > A little more than a month ago I posted on the subjet line warning and
> > > was told that they were harmless unless very frequent. They are now
> > > popping more than 10 times a day.
> > > I was asked to create a bug report if I wanted more info. So now I
> > > have an hb_report ready to go. Excuse the naive question, but where/how
> > > do I submit it?
> > 
> > Please keep us informed on this list, (share bug report URL)! I got the
> > same message popping up about twice a day and am interested why the
> > logging is so alarming.
> 
> I think you've already reported the issue.
> 
> It's actually considered to be a feature that the library warns
> you when it couldn't run the signal handler (or similar) in a
> timely fashion. Now, what's the cause is really up to the
> administrator to find out and take a look at how computing
> resources on nodes are used. Typically it's a severe load caused
> by backup or some other I/O intensive task. There's probably not
> much more one can do about it.

Dejan,

Thank you for the info. Much appreciated.

Since this 'issue' (from your comment, looks rather as a bonus!)
is already reported I won't bother.

Is there anything else, not related to heavy I/O activity that could
cause these messages? Network hickups?

I see about 0.1% of dropped packets on the network physical interface
but that's about it. The rest look very normal and quiet.

I'm asking as those nodes are Xen Dom0 with 2 CPU (pinned) and 2GB
(garanteed not to go below) of memory allocated and apart from running
the cluster stack and being nagios/ganglia/munin clients, they don't
do much. The DomUs (4) are still not online and even if they have
backups going on at night, they don't correlate at all with the
warnings messages.

Anyways, thanks again.
I'll keep an eye open and follow up if anything comes out of the blue.

regards,
jf

> 
> Thanks,
> 
> Dejan
> 
> > I made the following logcheck rule for the message:
> > 
> > ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ lrmd: \[[0-9]+\]: WARN:
> > G_SIG_dispatch: Dispatch function for SIGCHLD was delayed [0-9]+ ms \(>
> > 100 ms\) before being called \(GSource: 0x[0-9]+f20\)$
> > 
> > Thanks in advance,
> > 
> > Kind regards,
> > 
> > Jelle de Jong
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker