[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Satomi Taniguchi taniguchis at intellilink.co.jp
Tue Sep 9 05:37:31 EDT 2008


Hi lists,

I'm posting two patches to realize the function which we have discussed.
One is for Pacemaker-dev(aba67759589),
and another one is for Heartbeat-dev(fc047640072c).

The specifications are the following.
  (1) add the following 4 settings.
       "period-length" - Period in seconds to count monitor op's failures.
       "max-failures-per-period" - Maximum times per period a monitor may fail.
       "default-period-length" - default value of period-length for the cluster.
       "default-max-failures-per-period" - default value of 
max-failures-per-period for the cluster.

  (2) lrmd counts the monitor op's failures of each resource per period-length.
      And it ignores the resource's failure until the number of times of that
      exceeds the threshold (max-failures-per-period).

  (3) If the value of period-length is 0, lrmd calculates the suitable length of
      the period for the resource's operation.

      NOTE:
      "suitable" means "safe enough".
      In this patch, the expression to calculate "suitable" value is
      (monitor's interval + timeout) * max-failure-per-period.
      If the value of period-length is too short, and the number of times which
      monitor operation has finished in the period is less than the threshold,
      lrmd will never notify its client that the resource is failure.
      To avoid this, period-length requires the value which larger than
      (monitor's interval + timeout) * (max-failures-per-period - 1), at least.
      And allowing for the time of lrmd's internal processing or the margin of
      error of OS's timer and so on, I considered the first expression is
      suitable.

In addition, I add the function to lrmadmin to show the following information.
   i) the time when the period-length started of the specified resource.
  ii) the value of the counter of failures of the specified resource.
This is the third patch.

Your comments and suggestions are really appreciated.

Best Regards,
Satomi Taniguchi


-------------- next part --------------
A non-text attachment was scrubbed...
Name: allow_several_failures_for_hb.patch
Type: text/x-patch
Size: 8740 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20080909/1a026323/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: allow_several_failures_for_pm.patch
Type: text/x-patch
Size: 8769 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20080909/1a026323/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lrmadmin.patch
Type: text/x-patch
Size: 9412 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20080909/1a026323/attachment-0002.bin>


More information about the Pacemaker mailing list