[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Andrew Beekhof beekhof at gmail.com
Tue Sep 9 10:02:08 UTC 2008


On Sep 9, 2008, at 11:37 AM, Satomi Taniguchi wrote:

> Hi lists,
>
> I'm posting two patches to realize the function which we have  
> discussed.
> One is for Pacemaker-dev(aba67759589),
> and another one is for Heartbeat-dev(fc047640072c).
>
> The specifications are the following.
> (1) add the following 4 settings.
>      "period-length" - Period in seconds to count monitor op's  
> failures.
>      "max-failures-per-period" - Maximum times per period a monitor  
> may fail.
>      "default-period-length" - default value of period-length for  
> the cluster.
>      "default-max-failures-per-period" - default value of max- 
> failures-per-period for the cluster.
>
> (2) lrmd counts the monitor op's failures of each resource per  
> period-length.
>     And it ignores the resource's failure until the number of times  
> of that
>     exceeds the threshold (max-failures-per-period).
>
> (3) If the value of period-length is 0, lrmd calculates the suitable  
> length of
>     the period for the resource's operation.
>
>     NOTE:
>     "suitable" means "safe enough".
>     In this patch, the expression to calculate "suitable" value is
>     (monitor's interval + timeout) * max-failure-per-period.
>     If the value of period-length is too short, and the number of  
> times which
>     monitor operation has finished in the period is less than the  
> threshold,
>     lrmd will never notify its client that the resource is failure.
>     To avoid this, period-length requires the value which larger than
>     (monitor's interval + timeout) * (max-failures-per-period - 1),  
> at least.
>     And allowing for the time of lrmd's internal processing or the  
> margin of
>     error of OS's timer and so on, I considered the first expression  
> is
>     suitable.
>
> In addition, I add the function to lrmadmin to show the following  
> information.
>  i) the time when the period-length started of the specified resource.
> ii) the value of the counter of failures of the specified resource.
> This is the third patch.
>
> Your comments and suggestions are really appreciated.
>
> Best Regards,
> Satomi Taniguchi
>

[snip]

>
> struct lrmd_op
> diff -r aba677595891 crmd/lrm.c
> --- a/crmd/lrm.c	Sun Sep 07 00:02:29 2008 +0200
> +++ b/crmd/lrm.c	Mon Sep 08 15:58:39 2008 +0900
> @@ -1326,6 +1326,8 @@
> 	const char *op_delay = NULL;
> 	const char *op_timeout = NULL;
> 	const char *op_interval = NULL;
> +	const char *op_period_length = NULL;
> +	const char *op_max_failures_per_period = NULL;
> 	
> 	const char *transition = NULL;
> 	CRM_DEV_ASSERT(rsc_id != NULL);
> @@ -1340,6 +1342,8 @@
> 	op->start_delay = 0;
> 	op->copyparams = 0;
> 	op->app_name = crm_strdup(CRM_SYSTEM_CRMD);
> +	op->period_length = 0;
> +	op->max_failures_per_period = 0;
>
> 	if(rsc_op == NULL) {
> 		CRM_DEV_ASSERT(safe_str_eq(CRMD_ACTION_STOP, operation));
> @@ -1370,6 +1374,10 @@
> 	op_delay = g_hash_table_lookup(op->params,  
> crm_meta_name("start_delay"));
> 	op_timeout = g_hash_table_lookup(op->params,  
> crm_meta_name("timeout"));
> 	op_interval = g_hash_table_lookup(op->params,  
> crm_meta_name("interval"));
> +	op_period_length = g_hash_table_lookup(op->params,
> +		crm_meta_name("period_length"));
> +	op_max_failures_per_period = g_hash_table_lookup(op->params,
> +		crm_meta_name("max_failures_per_period"));
> #if CRM_DEPRECATED_SINCE_2_0_5
> 	if(op_delay == NULL) {
> 		op_delay = g_hash_table_lookup(op->params, "start_delay");
> @@ -1380,11 +1388,21 @@
> 	if(op_interval == NULL) {
> 		op_interval = g_hash_table_lookup(op->params, "interval");
> 	}
> +	if(op_period_length == NULL) {
> +		op_period_length = g_hash_table_lookup(op->params,  
> "period_length");
> +	}
> +	if(op_max_failures_per_period == NULL) {
> +		op_max_failures_per_period = g_hash_table_lookup(op->params,
> +		"max_failures_per_period");
> +	}

please do not add code for deprecated releases.

>
> #endif
> 	
> 	op->interval = crm_parse_int(op_interval, "0");
> 	op->timeout  = crm_parse_int(op_timeout,  "0");
> 	op->start_delay = crm_parse_int(op_delay, "0");
> +	op->period_length = crm_parse_int(op_period_length, "0");
> +	op->max_failures_per_period =
> +		crm_parse_int(op_max_failures_per_period, "1");
>
> 	/* sanity */
> 	if(op->interval < 0) {
> diff -r aba677595891 include/crm/msg_xml.h
> --- a/include/crm/msg_xml.h	Sun Sep 07 00:02:29 2008 +0200
> +++ b/include/crm/msg_xml.h	Mon Sep 08 15:58:39 2008 +0900
> @@ -150,6 +150,8 @@
> #define XML_RSC_ATTR_NOTIFY		"notify"
> #define XML_RSC_ATTR_STICKINESS		"resource-stickiness"
> #define XML_RSC_ATTR_FAIL_STICKINESS	"migration-threshold"
> +#define XML_RSC_ATTR_PERIOD_LENGTH	"period-length"
> +#define XML_RSC_ATTR_MAX_FAILURES_PER_PERIOD	"max-failures-per- 
> period"
> #define XML_RSC_ATTR_FAIL_TIMEOUT	"failure-timeout"
> #define XML_RSC_ATTR_MULTIPLE		"multiple-active"
> #define XML_RSC_ATTR_PRIORITY		"priority"
> diff -r aba677595891 include/crm/pengine/status.h
> --- a/include/crm/pengine/status.h	Sun Sep 07 00:02:29 2008 +0200
> +++ b/include/crm/pengine/status.h	Mon Sep 08 15:58:39 2008 +0900
> @@ -73,6 +73,8 @@
>
> 		int default_failure_timeout;
> 		int default_migration_threshold;
> +		int default_period_length;
> +		int default_max_failures_per_period;

we don't use this model anymore.
people should set resource attribute defaults in the rsc_defaults  
section.

this is much more flexible and lets _everything_ have a default value.

>
> 		int default_resource_stickiness;
> 		no_quorum_policy_t no_quorum_policy;
>
> @@ -166,6 +168,8 @@
> 		int	 failure_timeout;
> 		int	 effective_priority;
> 		int	 migration_threshold;
> +		int	 period_length;
> +		int	 max_failures_per_period;

also here.

>
>
> 		unsigned long long flags;
> 	
> diff -r aba677595891 lib/common/utils.c
> --- a/lib/common/utils.c	Sun Sep 07 00:02:29 2008 +0200
> +++ b/lib/common/utils.c	Mon Sep 08 15:58:39 2008 +0900
> @@ -1165,6 +1165,8 @@
> 		XML_RSC_ATTR_MULTIPLE,
> 		XML_RSC_ATTR_STICKINESS,
> 		XML_RSC_ATTR_FAIL_STICKINESS,
> +		XML_RSC_ATTR_PERIOD_LENGTH,
> +		XML_RSC_ATTR_MAX_FAILURES_PER_PERIOD,
> 		XML_RSC_ATTR_TARGET_ROLE,
>

[snip]

>
> xmlNode *
> diff -r aba677595891 tools/crm_mon.c
> --- a/tools/crm_mon.c	Sun Sep 07 00:02:29 2008 +0200
> +++ b/tools/crm_mon.c	Mon Sep 08 15:58:39 2008 +0900
> @@ -574,6 +574,8 @@
> 	printed = TRUE;
> 	print_as("   %s: migration-threshold=%d",
> 		 rsc->id, rsc->migration_threshold);
> +	print_as(" period-length=%d(s)", rsc->period_length);
> +	print_as(" max-failures-per-period=%d", rsc- 
> >max_failures_per_period);

I don't want crm_mon displaying this information.





More information about the Pacemaker mailing list