[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Wed Sep 17 02:12:12 EDT 2008

Hi Andrew,

Thank you for your opinion!

I have considered the way to modify each RA, of course.
But there are the following problems.

(1) Can't support the case that timeout occurs.
     Only lrmd can detect that timeout occurs. RA can't do this.
     One of the main purpose of this function is to avoid unnecessary F/O when
     sudden high load happened in very short time. In that case, monitor's
     timeout may occur. So, RA is not suitable to be implemented this function.
(2) A lack of generality.
     In that way, it requires all RAs which an user needs to be implemented
     this function. It is really troublesome.
(3) RA has to hold its information of failures outside of it.
     If you implement the same specification in RA, the information of
     a resource's failures has to be hold outside of RA.
     For example, in a file in /var/run/heartbeat/rsctmp.
     It causes unnecessary F/O.
     (For example, when the file is deleted or modified by hand, or failed to
      read/write, etc...)
(4) Monitor may take long time.
     I considered another specification. To check whether a resource is running
     or not several times (put in a sleep between them) in one monitor function,
     and to return NG only when all results of the checks are NG.
     (I think this specification is similar to yours...)
     But then, a monitor function becomes to take long time, and it influences
     the setting of monitor's timeout.

For the above reasons, I judged that it is better to implement this function in 
lrmd.

Best Regards,
Satomi TANIGUCHI

Andrew Beekhof wrote:
> Personally, I think that this is simply the wrong approach.
> 
> If a resource doesn't want the cluster to react to a failure, then the 
> RA just shouldn't report one.  Problem solved.
> 
> On Sep 11, 2008, at 9:06 AM, Satomi Taniguchi wrote:
> 
>> Hi Lars,
>>
>> Thank you for your reply.
>>
>>
>> Lars Marowsky-Bree wrote:
>>> On 2008-09-09T18:37:31, Satomi Taniguchi 
>>> <taniguchis at intellilink.co.jp> wrote:
>> [...snip...]
>>>> (2) lrmd counts the monitor op's failures of each resource per 
>>>> period-length.
>>>>     And it ignores the resource's failure until the number of times 
>>>> of that
>>>>     exceeds the threshold (max-failures-per-period).
>>> This means that this policy is enforced by the LRM; I'm not sure that's
>>> perfect. Should this not be handled by the PE?
>>
>> At first, I also tried to implement this function in PE.
>> But there were some problems.
>> (1) PE has no way to clear fail-count.
>>    When PE knows a resource's failure, the rsc's fail-count has already
>>    increased. So, it is proper to treat fail-count as the counter of
>>    failure for this new function, if it is implemented in PE.
>>    But, for example, when the period is over, it needs to clear the 
>> fail-count.
>>    At present, PE has no way to request something to cib.
>>    PE's role is to create a graph based on current CIB, not to change it,
>>    as far as I understand.
>>    And users may be confused if fail-count is cleared suddenly.
>> (2) After a resource is failed once, even if it is failed again, lrmd 
>> doesn't
>>    notify crmd of the failure.
>>    With new function, PE has to know the failure of resource even if 
>> it occurs
>>    consecutively. But normally, the result of monitor operation is 
>> notified
>>    only when it changes.
>>    In addition, even if lrmd always notify crmd of the resource's 
>> failure,
>>    the rsc's fail-count doesn't increase because magic-number doesn't 
>> change.
>>    That is to say, PE can't detect consecutive failures.
>>    I tried to the way to cancel the monitor operation of the failed 
>> resource
>>    and set the same op again.
>>    But in this way, new monitor operation is done immediately,
>>    then the interval of monitor operation becomes no longer constant.
>>
>> So, I considered it is more proper to implement the new function in lrmd.
>>
>>>> (3) If the value of period-length is 0, lrmd calculates the suitable 
>>>> length of
>> [...snip...]
>>>> In addition, I add the function to lrmadmin to show the following 
>>>> information.
>>>>  i) the time when the period-length started of the specified resource.
>>>> ii) the value of the counter of failures of the specified resource.
>>>> This is the third patch.
>>> This means that the full cluster state is no longer reflected in the
>>> CIB. I don't really like that at all.
>>
>> I see what you mean.
>> If it is possible, I want to gather all state of the cluster in the 
>> CIB, too.
>> For that purpose, I tried to implement this function in PE, at first.
>> But it seems _not_ to be possible for the above reasons...
>>
>>>> +    op_type = ha_msg_value(op->msg, F_LRM_OP);
>>>> +
>>>> +    if (STRNCMP_CONST(op_type, "start") == 0) {
>>>> +        /* initialize the counter of failures. */
>>>> +        rsc->t_failed = 0;
>>>> +        rsc->failcnt_per_period = 0;
>>>> +    }
>>> What about a resource being promoted to master state, or demoted again?
>>> Should the counter not be reset then too?
>>
>> Exactly.
>> Thank you for your pointing out.
>>
>>> (The functions are also getting verrry long; maybe factor some code out
>>> into smaller functions?)
>>
>> All right.
>> I will do so.
>>
>>> Regards,
>>>    Lars
>>
>> Best Regards,
>> Satomi TANIGUCHI
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at clusterlabs.org
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
> 
> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at clusterlabs.org
> http://list.clusterlabs.org/mailman/listinfo/pacemaker