[Pacemaker] About behavior in "Action Lost".

Thu Sep 30 00:37:29 UTC 2010

Hi Andrew,

> Sorry, it probably got rebased before I pushed it.
> 
> http://hg.clusterlabs.org/pacemaker/1.1/rev/dd8e37df3e96 should be the
> right link

Thanks!!

Hideo Yamuachi.

--- Andrew Beekhof <andrew at beekhof.net> wrote:

> Sorry, it probably got rebased before I pushed it.
> 
> http://hg.clusterlabs.org/pacemaker/1.1/rev/dd8e37df3e96 should be the
> right link
> 
> On Wed, Sep 29, 2010 at 2:51 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> > Hi Andrew,
> >
> >> Pushed as:
> >> � �http://hg.clusterlabs.org/pacemaker/1.1/rev/8433015faf18
> >>
> >> Not sure about applying to 1.0 though, its a dramatic change in behavior.
> >
> > The change of this link is not found.
> > Where did you update it?
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> > --- Andrew Beekhof <andrew at beekhof.net> wrote:
> >
> >> Pushed as:
> >> � �http://hg.clusterlabs.org/pacemaker/1.1/rev/8433015faf18
> >>
> >> Not sure about applying to 1.0 though, its a dramatic change in behavior.
> >>
> >> On Wed, Sep 22, 2010 at 11:18 AM, �<renayama19661014 at ybb.ne.jp> wrote:
> >> > Hi Andrew,
> >> >
> >> > Thank you for comment.
> >> >
> >> >> A long time ago in a galaxy far away, some messaging layers used to
> >> >> loose quite a few actions, including stops.
> >> >> About the same time, we decided that fencing because a stop action was
> >> >> lost wasn't a good idea.
> >> >>
> >> >> The rationale was that if the operation eventually completed, it would
> >> >> end up in the CIB anyway.
> >> >> And even if it didn't, the PE would continue to try the operation
> >> >> again until the whole node fell over at which point it would get shot
> >> >> anyway.
> >> >
> >> > Sorry...
> >> > I did not know the fact that there was such an argument in old days.
> >> >
> >> >
> >> >> Now, having said that, things have improved since then and perhaps,
> >> >> the interest of speeding up recovery in these situations, it is time
> >> >> to stop treating stop operations differently.
> >> >> Would you agree?
> >> >
> >> > That means, you change it in the case of "Action Lost" of the stop this time to carry out
> >> stonith?
> >> > If my recognition is right, I agree too.
> >> >
> >> > if(timer->action->type != action_type_rsc) {
> >> > send_update = FALSE;
> >> > } else if(safe_str_eq(task, "cancel")) {
> >> > /* we dont need to update the CIB with these */
> >> > send_update = FALSE;
> >> > }
> >> > ---> delete "else if(safe_str_eq(task, "stop")){..}" ?
> >> >
> >> > if(send_update) {
> >> > /* cib_action_update(timer->action, LRM_OP_PENDING, EXECRA_STATUS_UNKNOWN); */
> >> > cib_action_update(timer->action, LRM_OP_TIMEOUT, EXECRA_UNKNOWN_ERROR);
> >> > }
> >> >
> >> > Best Regards,
> >> > Hideo Yamauchi.
> >> >
> >> > --- Andrew Beekhof <andrew at beekhof.net> wrote:
> >> >
> >> >> On Tue, Sep 21, 2010 at 8:59 AM, �<renayama19661014 at ybb.ne.jp> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > Node was in state that the load was very high, and we confirmed monitor movement of
> >> Pacemeker.
> >> >> > Action Lost occurred in stop movement after the error of the monitor occurred.
> >> >> >
> >> >> > Sep �8 20:02:22 cgl54 crmd: [3507]: ERROR: print_elem: Aborting transition,
> action
> >> lost:
> >> >> [Action 9]:
> >> >> > In-flight (id: prmApPostgreSQLDB1_stop_0, loc: cgl49, priority: 0)
> >> >> > Sep �8 20:02:22 cgl54 crmd: [3507]: info: abort_transition_graph:
> >> action_timer_callback:486
> >> > -
> >> >> > Triggered transition abort (complete=0) : Action lost
> >> >> >
> >> >> >
> >> >> > For the load of the node, We think that the stop movement did not go well.
> >> >> > But cannot nodes execute stonith.
> >> >>
> >> >> A long time ago in a galaxy far away, some messaging layers used to
> >> >> loose quite a few actions, including stops.
> >> >> About the same time, we decided that fencing because a stop action was
> >> >> lost wasn't a good idea.
> >> >>
> >> >> The rationale was that if the operation eventually completed, it would
> >> >> end up in the CIB anyway.
> >> >> And even if it didn't, the PE would continue to try the operation
> >> >> again until the whole node fell over at which point it would get shot
> >> >> anyway.
> >> >>
> >> >> Now, having said that, things have improved since then and perhaps,
> >> >> the interest of speeding up recovery in these situations, it is time
> >> >> to stop treating stop operations differently.
> >> >> Would you agree?
> >> >>
> >> >> _______________________________________________
> >> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >>
> >> >> Project Home: http://www.clusterlabs.org
> >> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >> >
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >>
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>