[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Satomi TANIGUCHI taniguchis at intellilink.co.jp
Tue Oct 7 10:55:09 UTC 2008


Hi,


I'm posting patches to add "monitor-loop" operation.
Each patch's roles are:
(1) monitor_loop_hb.patch: add ocf_monitor_loop() in .ocf-shellfuncs.
                            This is for Heartbeat(83a87f2b6554).
(2) monitor_loop_pm.patch: add "monitor-loop" operation to cib.
                            This is for Pacemaker(0f6fc6f8c01f).

1. Specifications
monitor-loop operation calls monitor op consecutively until:
(1) monitor op returns normal value (OCF_SUCCESS or OCF_RUNNING_MASTER).
(2) count of failures becomes more than threshold.

To set the threshold value, add a new attribute "maxfailures"
in each resource's <instance_attributes>.
If you don't set the threshold, or if you set zero,
monitor-loop op never returns until it detects monitor op's success.
And an operation timeout will occur.

2. How to USE
(1) Add the following 1 line between "case $__OCF_ACTION in" and "esac"
     in your RA.
         monitor-loop)   ocf_monitor_loop ${OCF_RESKEY_maxfailures};;
     As an example, I attached a patch for Dummy resource
     (monitor_loop_Dummy.patch).
(2) Describe cib.xml.
     Add "maxfailures" in <instance_attributes>, and add "monitor-loop" operation
     instead of a regular monitor op.
     ex.)
     <primitive id="prmDummy1" class="ocf" type="Dummy" provider="heartbeat">
       <instance_attributes id="prmDummy1-instance-attributes">
         <nvpair id="prmDummy1-instance-attrs-maxfailures" name="maxfailures" val
     ue="3"/>
       </instance_attributes>
       <operations>
         <op id="prmDummy1-operations-start" name="start" interval="0" timeout="3
     00" on-fail="restart"/>
         <op id="prmDummy1-operations-monitor-loop" name="monitor-loop" interval=
     "10" timeout="60" on-fail="restart"/>
         <op id="prmDummy1-operations-stop" name="stop" interval="0" timeout="300
     " on-fail="block"/>
       </operations>
     </primitive>

3. NOTE
monitor-loop operation is only for OCF resources, not for STONITH resources.


Thank you very much for your advices, Andrew and Lars!
With just a little alteration, I could realize what I considered.

Now I would like to hear your opinions.
For OCF resources, it's easy to add monitor-loop operation due to
.ocf-shellfuncs.
But STONITH resources don't have any common file like that.
So, when I want to add monitor-loop (or status-loop) operation in
STONITH resources, I have to add a function each of them.
It is almost the same as to modify each status function of them...

Even if we leave out monitor-loop operation,
STONITH resources should have same common file like OCF resources?


Your comments and suggestions are really appreciated.


Best Regards,
Satomi TANIGUCHI





Lars Marowsky-Bree wrote:
> On 2008-09-17T10:09:21, Andrew Beekhof <beekhof at gmail.com> wrote:
> 
>> I can't help but feel this is all a work-around for badly written RAs 
>> and/or overly aggressive timeouts.  There's nothing wrong with setting 
>> large timeouts... if you set 1 hour and the op returns in 1 second, then we 
>> don't wait around doing nothing for the other 59 minutes and 59 seconds.
> 
> Agreed. RAs shouldn't fail randomly. RAs are considered part of the
> "trusted" infrastructure.
> 
>> But if you really really only want to report an error if N monitors fail in 
>> M seconds (I still think this is crazy, but whatever), then simply 
>> implement monitor_loop() which calls monitor() up to N times looking for 
>> $OCF_SUCCESS and add:
>>
>>   <op id=... name="monitor_loop" timeout="M" interval=... />
>>
>> instead of a regular monitor op.  Or even in addition to a regular monitor 
>> op with on_fail=ignore if you want.
> 
> Best idea so far.
> 
> 
> 
> Regards,
>     Lars
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: monitor_loop_hb.patch
Type: text/x-patch
Size: 1079 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081007/49114fef/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monitor_loop_pm.patch
Type: text/x-patch
Size: 1922 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081007/49114fef/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monitor_loop_Dummy.patch
Type: text/x-patch
Size: 433 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20081007/49114fef/attachment-0005.bin>


More information about the Pacemaker mailing list