[ClusterLabs] Debugging problems with resource timeout without any actions from cluster
Ken Gaillot
kgaillot at redhat.com
Tue Oct 17 15:41:27 CEST 2017
On Tue, 2017-10-17 at 15:30 +0600, Sergey Korobitsin wrote:
> Ken Gaillot ☫ → To Cluster Labs - All topics related to open-source
> clustering welcomed @ Thu, Oct 12, 2017 09:47 -0500
>
> Thanks for the answer, Ken,
>
> > > I found several ways to achieve that:
> > >
> > > 1. Put cluster in maintainance mode (as described here:
> > > https://www.hastexo.com/resources/hints-and-kinks/maintenance-
> > > acti
> > > ve-pacemaker-clusters/)
> > >
> > > As far as I understand, services will be monitored, all logs
> > > written,
> > > etc., but no action in case of failures will be taken. Is that
> > > right?
> >
> > Actually, maintenance mode stops all monitors (except those with
> > role=Stopped, which ensure a service is not running).
>
> OK, got it.
>
> > > 2. Put the particular resource to unmanaged mode, as described
> > > here:
> > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pac
> > > emak
> > > er_Explained/#s-monitoring-unmanaged
> >
> > Disabling starts and stops is the exact purpose of unmanaged, so
> > this
> > is one way to get what you want. FYI you can also set this as a
> > global
> > default for all resources by setting it in the resource defaults
> > section of the configuration.
>
> OK, got it too.
>
> > > 3. Start all resources and remove start and stop operations from
> > > them.
> >
> > :-O
>
> This is kinda quirky way, but it exists! :-)
>
> > > Which is the best way to achieve my purpose? I would like cluster
> > > to
> > > run
> > > as usual (and logging as usual or with trace on problematic
> > > resource),
> > > but no action in case of monitor failure should be taken.
> >
> > That's actually a different goal, also easily accomplished, by
> > setting
> > on-fail=ignore on the monitor operation. From the sound of it, this
> > is
> > closer to what you want, since the cluster is still allowed to
> > start/stop resources when you standby a node, etc.
>
> I'll try this one.
>
> > You could also delete the recurring monitor operation from the
> > configuration, and it wouldn't run at all. But keeping it and
> > setting
> > on-fail=ignore lets you see failures in cluster status.
> > However, I'm not sure bypassing the monitor is the best solution to
> > this problem. If the problem is simply that your database monitor
> > can
> > legitimately take longer than 20 seconds in normal operation, then
> > raise the timeout as needed.
>
> I want to determine why it needed more than 20 seconds, and under
> what
> circumstances.
Ah, excellent, that's what on-fail=ignore is useful for :-)
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list