[Pacemaker] [Problem] The attrd does not sometimes stop.

renayama19661014 at ybb.ne.jp renayama19661014 at ybb.ne.jp
Thu Nov 3 20:20:12 EDT 2011


Hi Andrew,
Hi Alan,

We work hard to collect the evidence of reproduction and the problem of the phenomenon.
However, we do not yet get the evidence.
I will wait for the information from Alan.

Best Regards,
Hideo Yamauchi.



--- On Wed, 2011/11/2, Andrew Beekhof <andrew at beekhof.net> wrote:

> On Tue, Oct 18, 2011 at 12:19 PM,  <renayama19661014 at ybb.ne.jp> wrote:
> > Hi,
> >
> > We sometimes fail in a stop of attrd.
> >
> > Step1. start a cluster in 2 nodes
> > Step2. stop the first node.(/etc/init.d/heartbeat stop.)
> > Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat
> > stop.)
> >
> > The attrd catches the TERM signal, but does not stop.
> 
> There's no evidence that it actually catches it, only that it is sent.
> I've seen it before but never figured out why it occurs.
> 
> >
> > (snip)
> > Oct  5 02:37:38 hpdb0201 crmd: [12238]: info: do_exit: [crmd] stopped (0)
> > Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: send_ipc_message: IPC Channel to
> > 12238 is not connected
> > Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: send_via_callback_channel:
> > Delivery of reply to client 12238/0dbc9e28-d90d-4335-b9c4-9dd3fcb38163 failed
> > Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: do_local_notify: A-Sync reply to
> > crmd failed: reply failed
> > Oct  5 02:37:38 hpdb0201 heartbeat: [12223]: info: killing
> > /usr/lib64/heartbeat/attrd process group 12237 with signal 15
> > Oct  5 02:47:03 hpdb0201 cib: [12234]: info: cib_stats: Processed 97 operations
> > (4123.00us average, 0% utilization) in the last 10min
> > Oct  5 07:15:25 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC
> > channel took 1010 ms (> 100 ms)
> > Oct  5 07:15:26 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC
> > channel took 1010 ms (> 100 ms)
> > Oct  5 07:15:37 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
> > Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
> > being called (GSource: 0xd28010)
> > Oct  5 07:15:37 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
> > started at 431583547 should have started at 431583444
> > Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
> > Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before
> > being called (GSource: 0xd27dd0)
> > Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
> > started at 431584254 should have started at 431584151
> > Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
> > Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
> > being called (GSource: 0xd28010)
> > Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
> > started at 431584254 should have started at 431584151
> > Oct  5 07:16:59 hpdb0201 heartbeat: [12223]: WARN: G_CH_check_int: working on
> > write child took 1010 ms (> 100 ms)
> > Oct  5 07:17:14 hpdb0201 stonithd: [12236]: WARN: G_CH_check_int: working on
> > Heartbeat API channel took 1010 ms (> 100 ms)
> > Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
> > Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before
> > being called (GSource: 0xd27dd0)
> > Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
> > started at 431607988 should have started at 431607885
> > Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
> > Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
> > being called (GSource: 0xd28010)
> > Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
> > started at 431607988 should have started at 431607885
> > (snip)
> >
> > We try the reproduction of the phenomenon, but do not reappear very much.
> >
> > The same phenomenon is reported by the next email.
> > However, the argument of the problem is over on the way.
> >
> >  * http://www.gossamer-threads.com/lists/linuxha/pacemaker/62147
> >
> > The phenomenon occurred by the next combination.
> >  * pacemaker-1.0.11
> >  * resource-agents-3.9.2
> >  * cluster-glue-1.0.7
> >  * heartbeat-3.0.5
> >
> > I registered these contents with Bugzilla.
> >  * http://bugs.clusterlabs.org/show_bug.cgi?id=5004
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> 




More information about the Pacemaker mailing list