[Pacemaker] lrm monitor failure status lost during DC election
Andrew Beekhof
andrew at beekhof.net
Tue Apr 30 20:45:20 EDT 2013
On 19/04/2013, at 6:36 AM, David Adair <david_adair at xyratex.com> wrote:
> Hello.
>
> I have an issue with pacemaker 1.1.6.1 but believe this may still be
> present in the
> latest git versions and would like to know if the fix makes sense.
>
>
> What I see is the following:
> Setup:
> - 2 node cluster
> - ocf:heartbeat:Dumy resource on non-DC node.
> - Force DC reboot or stonith and fail resource while there is no DC.
>
> Result:
> - node with failed monitor becomes DC (good)
>
> - lrmd reports resource as failed during every monitor interval but
> since these failures are not rc status changes they are not sent to crmd.
> (good -- it is failing, but ..)
>
> - crm_mon / cibadmin --query report resource as running OK. (not good)
>
>
> The resource has failed but is never restarted I believe the failing
> resource and any group it belongs to should be recovered during/after
> the DC election.
>
> I think this is due to the operation of build_active_RAs on the surviving node:
>
> build_operation_update(xml_rsc, &(entry->rsc), entry->last,
> __FUNCTION__);
> build_operation_update(xml_rsc, &(entry->rsc), entry->failed,
> __FUNCTION__);
> for (gIter = entry->recurring_op_list; gIter != NULL; gIter =
> gIter->next) {
> build_operation_update(xml_rsc, &(entry->rsc),
> gIter->data, __FUNCTION__);
> }
>
> What this produces is
> last failed list[0]
> list[1]
> start_0: rc=0; monitor_1000: rc=7; monitor_1000: rc=7; monitor_1000: rc=0
list[] should only have one element as both are for monitor_1000
I have a vague recollection of an old bug in this area and strongly suspect that something more recent wont have the same problem.
>
> The final result in the cib appears to be the last entry which is from
> the initial
> transition of the monitor from rc=-1 to rc=0.
>
> To fix this I swapped the order of recurring_op_list so that the last transition
> is at the end of the list rather than the beginning. With this this change I
> see what I believe is the desired behavior -- the resource is stopped and
> re-stared when the DC election is finalized.
>
> The memcpy is a backport of a corresponding change in lrmd_copy_event
> to simplify debugging by maintaining the rcchanged time.
>
> ---------------------
> This patch swaps the order of recurring operations (monitors) in the
> lrm history cache. By placing the most recent change at the end of the
> list it is properly detected by pengine after a DC election.
>
> With the new events placed at the start of the list the last thing
> in the list is the initial startup with rc=0. This makes pengine
> believe the resource is working properly even though lrmd is reporting
> constand failure.
>
> It is fairly easy to get into this situation when a shared resource
> (storage enclosure) fails and causes the DC to be stonithed.
>
> diff --git a/crmd/lrm.c b/crmd/lrm.c
> index 187db76..f8974f6 100644
> --- a/crmd/lrm.c
> +++ b/crmd/lrm.c
> @@ -217,7 +217,7 @@ update_history_cache(lrm_rsc_t * rsc, lrm_op_t * op)
>
> if (op->interval > 0) {
> crm_trace("Adding recurring op: %s_%s_%d", op->rsc_id,
> op->op_type, op->interval);
> - entry->recurring_op_list =
> g_list_prepend(entry->recurring_op_list, copy_lrm_op(op));
> + entry->recurring_op_list =
> g_list_append(entry->recurring_op_list, copy_lrm_op(op));
>
> } else if (entry->recurring_op_list && safe_str_eq(op->op_type,
> RSC_STATUS) == FALSE) {
> GList *gIter = entry->recurring_op_list;
> @@ -1756,6 +1756,9 @@ copy_lrm_op(const lrm_op_t * op)
>
> crm_malloc0(op_copy, sizeof(lrm_op_t));
>
> + /* Copy all int values, pointers fixed below */
> + memcpy(op_copy, op, sizeof(lrm_op_t));
> +
> op_copy->op_type = crm_strdup(op->op_type);
> /* input fields */
> op_copy->params = g_hash_table_new_full(crm_str_hash, g_str_equal,
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list