[Pacemaker] Time to a service stop is very long.

Thu Oct 28 01:18:11 EDT 2010

On Thu, Oct 28, 2010 at 3:11 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi Andrew,
>
>
>> Wait, I think I read that wrong.
>> I would expect that no-matter what that pacemaker would exit after
>> shutdown-escalation.
>>
>> You're saying it didn't?
>> Better create a bug and attach the logs.
>
> At the time of Step4, srv03,srv04 requested a stop of the Heartbeat service.
>
> To see log, the request of the stop of srv03 is considered to be it at 16:46:57.
>
> Because I set "shutdown-escalation" for five minutes, I thought that the srv03 node stopped at about
> 16:52:00.
>
> But, the srv03 node started a stop at 16:57:20.
>
> Is understanding of my "shutdown-escalation" wrong?

I don't think so, I think you probably found a bug.

>
>> Better create a bug and attach the logs.
>
> ok.
> Please wait....
>
> Best Regards,
> Hideo Yamauchi.
>
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending shutdown request to DC:
>> srv03
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: Creating shutdown request
>> for srv03
>> >> (state=S_IDLE)
>> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 operations (38149.00us
>> average, 5%
>> >> utilization) in the last 10min
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown Escalation (I_STOP)
>> just popped!
>
>
>
> --- Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> On Wed, Oct 27, 2010 at 12:36 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>> > On Thu, Oct 21, 2010 at 10:30 AM, �<renayama19661014 at ybb.ne.jp> wrote:
>> >> Hi,
>> >>
>> >> We confirmed movement when we set freeze in no-quorum-policy.
>> >> In the cluster that freeze setting became effective, we stopped the service.
>> >>
>> >> However, a stop of the service took time very much.
>> >>
>> >> We set "shutdown-escalation" for five minutes to shorten the time for test.
>> >> But, a stop of the service of one node takes time more than five minutes.
>> >>
>> >> I confirmed it in the next procedure.
>> >>
>> >> Step1) Start four nodes and send cib.xml.
>> >> Step2) Intercept Heartbeat communication and divide it in two nodes.
>> >> Step3) The node does freeze.
>> >> Step4) In two divided one nodes, we stop Hearbeat at the same time.
>> >>
>> >> [root at srv03 ~]# service heartbeat stop
>> >> Stopping High-Availability services:
>> >> [root at srv04 ~]# service heartbeat stop
>> >> Stopping High-Availability services:
>> >>
>> >> Step5) Heartbeat of one node stops in a few minutes.
>> >> [root at srv04 ~]# service heartbeat stop
>> >> Stopping High-Availability services: � � � � � �
> � � � � � [ �OK �]
>> >>
>> >> Step6) But, Heartbeat of one node does not stop anymore unless, furthermore, time passes.
>> >> �* The timer of shutdown-escalation starts, but time when we set it(5min) does not seem to
>> become
>> >> effective.
>> >>
>> >> [root at srv03 ~]# service heartbeat stop
>> >> Stopping High-Availability services: � � � � � �
> � � � � � [ �OK �]
>> >>
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending shutdown request to DC:
>> srv03
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: Creating shutdown request
>> for srv03
>> >> (state=S_IDLE)
>> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 operations (38149.00us
>> average, 5%
>> >> utilization) in the last 10min
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown Escalation (I_STOP)
>> just popped!
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from crm_timer_popped()
>> received
>> >> in state S_IDLE
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State transition S_IDLE ->
>> S_STOPPING [
>> >> input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ]
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to pengine: [5007]
>> >>
>> >>
>> >> Is it right movement to take time to this service stop?
>> >
>> > It's what I would expect to happen, but its possibly not ideal.
>>
>> Wait, I think I read that wrong.
>> I would expect that no-matter what that pacemaker would exit after
>> shutdown-escalation.
>>
>> You're saying it didn't?
>> Better create a bug and attach the logs.
>>
>> >
>> >> �* Because the log was very big, I did not attach it.
>> >> �* If log is necessary, I send it in Bugzilla.
>> >>
>> >> Best Regards,
>> >> Hideo Yamauchi.
>> >>
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >>
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>