[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Satomi Taniguchi taniguchis at intellilink.co.jp
Fri Aug 29 07:28:59 EDT 2008


Hi,

I considered about the specifications which I told you
in the previous mail,
there are some problems.

1) It is too difficult to decide the value of each setting.
    Each of these 3 settings... max-consecutive-failures, period-length,
    and max-failures-per-period is simple in itself,
    but they are so complex to use them together.
    Probably, users may be confused and can't decide each value of them.
2) There is the case that F/O does not occur even if
    a resource is really failed.
    When every other monitor operation is failed,
    besides they takes long time,
    and the value of period-length is too short,
    lrmd don't notify to its client even if it detect the failure
    so many times.

To solve the first problem, in other words,
to simplify the specifications and settings,
now I think that it is proper to count only failures per period,
and add only 2 settings, period-length and max-failures-per-period.
The problem which this method has is the following.
It is almost the same as 2) problem above.

a) When the value of period-length is too short,
    lrmd doesn't notify to its client even if a resource is
    really failed.
    When monitor operation takes long time, the number of times
    which monitor op is executed in a period may be smaller than
    max-failures-per-period.

This problem is caused by period-length having short value.
Till the next monitor returns after one monitor returned,
It takes (interval + timeout) sec, in the worst case.
So, the safe value of period-length is the following.
period-length > (monitor interval + timeout) * (max-failures-per-period 
- 1) ...exp1


Therefore, I intend to add the following specifications.
  i) The default value of period-length is surely safe value,
     (monitor interval + timeout) * max-failures-per-period ...exp2.
     If the attribute "period-length" is set to "0" or it is not set,
     lrmd calculate exp2 and use the value.
ii) When the value more than 1 is set to the attribute "period-length",
     lrmd use it, of course.
     But if it doesn't satisfy the condition exp1, output a WARNING.

In this way, users don't need to concider deeply about "period-length",
and they can use safely this function.


I would like to hear any opinions about this.


Best Regards,
Satomi Taniguchi










Satomi Taniguchi wrote:
> Hi Dejan and Lars,
> 
> All right, I agree with you.
> Only "max-consecutive-failures" is unsafe
> in the case every other monitor operation failed, indeed.
> 
> And only "max-failures-per-period" is unsafe too.
> (In the case monitor function itself takes long time)
> 
> 
> Well then, I would like to confirm the specifications.
> (i)3 settings should be added.
>  - max-consecutive-failures:
>     Maximum times in a row a monitor may fail,
>     Default value -> 1
>  - period-length:
>     Period in seconds to count monitor failures,
>     Set to zero to disable,
>     Default value -> 0
>  - max-failures-per-period:
>     Maximum times per period a monitor may fail,
>     Default value -> 1
> 
> (ii)When a condition matches even either "consecutive" or "per-period",
> lrmd notifies its client that the failure occurs.
> 
> Do you have any comments?
> I'm trying to implement above.
> 
> 
> 
> Best Regards,
> Satomi Taniguchi
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker





More information about the Pacemaker mailing list