[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Fri Jun 27 08:52:08 EDT 2008

Hi Keisuke-san,

On Fri, Jun 27, 2008 at 09:19:33PM +0900, Keisuke MORI wrote:
> Hi,
> 
> Dejan Muhamedagic <dejanmm at fastmail.fm> writes:
> > On Tue, Jun 24, 2008 at 04:02:06PM +0200, Lars Marowsky-Bree wrote:
> >> On 2008-06-24T15:48:12, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> >> 
> >> > >    But precisely we have two scenarios to configure to:
> >> > >    a) monitor NG -> stop -> start on the same node
> >> > >       -> monitor NG (Nth time) -> stop -> failover to another node
> >> > >    b) monitor NG -> monitor NG (Nth times) -> stop -> failover to another node
> >> > > 
> >> > >    The current pacemaker behaves as a), I think, but b) is also
> >> > >    useful when you want to ignore a transient error.
> >> > 
> >> > The b) part has already been discussed on the list and it's
> >> > supposed to be implemented in lrmd. I still don't have the API
> >> > defined, but thought about something like
> >> > 
> >> > 	max-total-failures (how many times a monitor may fail)
> >> > 	max-consecutive-failures (how many times in a row a monitor may fail)
> >> > 
> 
> I also thought that it should be implemented in lrmd at first,
> but now I think it would be better to handle it in crm.
> 
> If we would implement it in lrmd, it would have two kinds of
> fail-counts in different modules (cib and lrmd) and users have
> to understand and use both tools for cib and lrmd depending on
> the kind of the fails even though they are for very similar
> purpose. I think it's confusing for users.

The fail-counts in lrmd will probably be available for
inspection. And they would probably also expire after some time.
What I suggested in the previous messages is actually missing
the time dimension: There should be maximum failures within
a period.

> So I think that lrmd should always report failures like now,
> and crm/cib should hold all the failed status and make a decision.

Of course, it could be done like that as well, though that could
make processing in crm much more complex.

> >> > These should probably be attributes defined on the monitor
> >> > operation level.
> >> 
> >> The "ignore failure reports" clashes a bit with the "react to failures
> >> ASAP" requirement.
> >> 
> >> It is my belief that this should be handled by the RA, not in the LRM
> >> nor the CRM. The monitor op implementation is the place to handle this.
> 
> 
> Yes, it can be implemented in RAs, and that's what we've done actually.
> 
> But in that case, such RAs would have a similar retry loop in
> each scripts and would have their own retry parameters for each RA types.
> 
> I think it's worth having a common way to handle this.

Yes, I also think that having this handled in one place would be
beneficial. The resource agents, though they should know the best
the resources they manage, may not always take into account
all environment peculiarities. Then it is up to the user to
decide if they want to allow a monitor for the resource to fail
now and then.

> >> Beyond that, I strongly feel that "transient errors" are a bad
> >> foundation to build clusters on.
> >
> > Of course, all that is right. However, there are some situations
> > where we could bend the rules. I'm not sure what Keisuke-san had
> > in mind, but for example one could be more forgiving when
> > monitoring certain stonith resources.
> >
> 
> One situation in my mind is when sudden high load happened in
> very short time. The application may fail to respond to the
> monitor op by the RA when the load is very high, but if such
> 'spark of the load' ceases shortly then we don't want to rush to the failover.

These situations are tricky to handle. Such a high load may also
be a sign that resources should indeed move elsewhere. Or it may
even be considered as a service disruption. Though there are most
probably shops which would prefer not to do a failover in such
cases. At any rate, this feature, if it gets implemented, would
have to be used with utmost care.

> Another case we've met was when we wrote a RA to check for some hardware.
> The status from the hardware rarely failed in very specific timing,
> and retrying the check was just fine.

That's what I often observed with some stonith devices.

Cheers,

Dejan

> 
> Thanks,
> -- 
> Keisuke MORI
> NTT DATA Intellilink Corporation
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker