[Pacemaker] [Problem] The attrd does not sometimes stop.

Andrew Beekhof andrew at beekhof.net
Mon Nov 14 00:58:09 UTC 2011


On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
> On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
>> On Tue, Oct 18, 2011 at 12:19 PM,  <renayama19661014 at ybb.ne.jp> wrote:
>> > Hi,
>> >
>> > We sometimes fail in a stop of attrd.
>> >
>> > Step1. start a cluster in 2 nodes
>> > Step2. stop the first node.(/etc/init.d/heartbeat stop.)
>> > Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat
>> > stop.)
>> >
>> > The attrd catches the TERM signal, but does not stop.
>>
>> There's no evidence that it actually catches it, only that it is sent.
>> I've seen it before but never figured out why it occurs.
>
> I had it once tracked down almost to where it occurs, but then got distracted.
> Yes the signal was delivered.
>
> I *think* it had to do with attrd doing a blocking read,
> or looping in some internal message delivery function too often.
>
> I had a quick look at the code again now, to try and remember,
> but I'm not sure.
>
> I *may* be that, because
> xmlfromIPC(IPC_Channel * ch, int timeout) calls
>    msg = msgfromIPC_timeout(ch, MSG_ALLOWINTR, timeout, &ipc_rc);
>
> And MSG_ALLOWINTR will cause msgfromIPC_ll() to
>        IPC_INTR:
>                if ( allow_intr){
>                        goto startwait;
>
> Depending on the frequency of deliverd signals, it may cause this goto
> startwait loop to never exit, because the timeout always starts again
> from the full passed in timeout.
>
> If only one signal is deliverd, it may still take 120 seconds
> (MAX_IPC_DELAY from crm.h) to be actually processed, as the signal
> handler only raises a flag for the next mainloop iteration.
>
> If a (non-fatal) signal is delivered every few seconds,
> then the goto loop will never timeout.
>
> Please someone check this for plausibility ;-)

Most plausible explanation I've heard so far... still odd that only
attrd is affected.
So what do we do about it?




More information about the Pacemaker mailing list