[ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

Ken Gaillot kgaillot at redhat.com
Tue Feb 26 10:05:33 EST 2019


On Tue, 2019-02-26 at 06:55 +0300, Andrei Borzenkov wrote:
> 26.02.2019 1:08, Ken Gaillot пишет:
> > On Mon, 2019-02-25 at 23:00 +0300, Andrei Borzenkov wrote:
> > > 25.02.2019 22:36, Andrei Borzenkov пишет:
> > > > 
> > > > > Could you please help me understand:
> > > > > 1. Why doesn't pacemaker process the failure of
> > > > > Stateful_Test_2
> > > > > resource
> > > > > immediately after first failure?
> > > 
> > > I'm still not sure why.
> > > 
> > > > I vaguely remember something about sequential execution
> > > > mentioned
> > > > before
> > > > but cannot find details.
> > > > 
> > > > > 2. Why does the monitor failure of Stateful_Test_2 continue
> > > > > even
> > > > > after the
> > > > > promote of Stateful_Test_1 has been completed? Shouldn't it
> > > > > handle
> > > > > Stateful_Test_2's failure and take necessary action on it? It
> > > > > feels as if
> > > > > that particular failure 'event' has been 'dropped' and
> > > > > pengine is
> > > > > not even
> > > > > aware of the Stateful_Test_2's failure.
> > > > > 
> > > > 
> > > > Yes. Although crm_mon shows resource as being master on this
> > > > node,
> > > > in
> > > > reality resource is left in failed state forever and monitor
> > > > result
> > > > is
> > > > simply ignored.
> > > > 
> > > 
> > > Yes, pacemaker reacts only on result change (more precisely, it
> > > tells
> > > lrmd to report only the first result and suppress all further
> > > consecutive duplicates). As the first report gets lost due to low
> > > failure-timeout, this explains what happens.
> > 
> > That would explain why the first monitor failure is ignored, but
> > after
> > the long-running action completes, a new transition should see the
> > failure timeout, wipe the resource history, and log a message on
> > the DC
> > about "Re-initiated expired calculated failure", at which point the
> > cluster should schedule recovery.
> > 
> > Do the logs show such a message?
> > 
> 
> Yes, this message appears. Not sure how it changes anything because
> the
> problem is not that pacemaker does not react to subsequent failures,
> but
> that after the first monitor failure lrmd does not report anything to
> pacemaker at all.

That's what the "re-initiated" code is meant to work around. It's
hacky: it changes the restart hash for the operation, making it appear
to have changed parameter values, which requires a restart (which will
stop and restart the monitor). At least that's how I think it works by
looking at the code. If that's not happening, I may be missing
something.

> crmd/lrm_state.c:lrm_state_exec()
> 
>     return ((lrmd_t *) lrm_state->conn)->cmds->exec(lrm_state->conn,
>                                                     rsc_id,
>                                                     action,
>                                                     userdata,
>                                                     interval,
>                                                     timeout,
>                                                     start_delay,
> 
> lrmd_opt_notify_changes_only, params);
> 
> 
> lrmd/lrmd.c:send_cmd_complete_notify()
> 
>     /* if the first notify result for a cmd has already been sent
> earlier, and the
>      * the option to only send notifies on result changes is set.
> Check
> to see
>      * if the last result is the same as the new one. If so, suppress
> this update */
>     if (cmd->first_notify_sent && (cmd->call_opts &
> lrmd_opt_notify_changes_only)) {
>         if (cmd->last_notify_rc == cmd->exec_rc &&
>             cmd->last_notify_op_status == cmd->lrmd_op_status) {
> 
>             /* only send changes */
>             return;
>         }
> 
>     }
> 
> From pacemaker point of view there is no resource status change at
> all.
> It never initiated different operation for this resource, so last rc
> and
> op_status never change.
> 
> 
> > > ...
> > > 
> > > > > 
> > > > > Could you please help us in understanding this behavior and
> > > > > how
> > > > > to fix this?
> > > > > 
> > > > 
> > > > Your problem is triggered by too low failure-timeout. Failure
> > > > of
> > > > master
> > > > is cleared before pacemaker picks it for processing (or so I
> > > > interpret
> > > > it). You should set failure-timeout to be longer than your
> > > > actions
> > > > may
> > > > take. This will give you at least workaround.
> > > > 
> > > > Note that in your configuration resource cannot be recovered
> > > > anyway.
> > > > migration-threshold is 1 so pacemaker cannot (try to) restart
> > > > master on
> > > > the same node but you prohibit running it anywhere else.
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list