[ClusterLabs] Debugging problems with resource timeout without any actions from cluster
Sergey Korobitsin
undertaker at arta.kz
Tue Oct 17 05:30:13 EDT 2017
Ken Gaillot ☫ → To Cluster Labs - All topics related to open-source clustering welcomed @ Thu, Oct 12, 2017 09:47 -0500
Thanks for the answer, Ken,
> > I found several ways to achieve that:
> >
> > 1. Put cluster in maintainance mode (as described here:
> > https://www.hastexo.com/resources/hints-and-kinks/maintenance-acti
> > ve-pacemaker-clusters/)
> >
> > As far as I understand, services will be monitored, all logs
> > written,
> > etc., but no action in case of failures will be taken. Is that
> > right?
>
> Actually, maintenance mode stops all monitors (except those with
> role=Stopped, which ensure a service is not running).
OK, got it.
> > 2. Put the particular resource to unmanaged mode, as described here:
> > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemak
> > er_Explained/#s-monitoring-unmanaged
>
> Disabling starts and stops is the exact purpose of unmanaged, so this
> is one way to get what you want. FYI you can also set this as a global
> default for all resources by setting it in the resource defaults
> section of the configuration.
OK, got it too.
> > 3. Start all resources and remove start and stop operations from
> > them.
>
> :-O
This is kinda quirky way, but it exists! :-)
> > Which is the best way to achieve my purpose? I would like cluster to
> > run
> > as usual (and logging as usual or with trace on problematic
> > resource),
> > but no action in case of monitor failure should be taken.
>
> That's actually a different goal, also easily accomplished, by setting
> on-fail=ignore on the monitor operation. From the sound of it, this is
> closer to what you want, since the cluster is still allowed to
> start/stop resources when you standby a node, etc.
I'll try this one.
> You could also delete the recurring monitor operation from the
> configuration, and it wouldn't run at all. But keeping it and setting
> on-fail=ignore lets you see failures in cluster status.
> However, I'm not sure bypassing the monitor is the best solution to
> this problem. If the problem is simply that your database monitor can
> legitimately take longer than 20 seconds in normal operation, then
> raise the timeout as needed.
I want to determine why it needed more than 20 seconds, and under what
circumstances.
--
Bright regards, Sergey Korobitsin,
Chief Research Officer
Arta Software, http://arta.kz/
xmpp:undertaker at jabber.arta.kz
не противостоять этой тенценции; самым решительным броском вперед - идеей,
и наиболее творческим из всех действий - бездельем.
-- Тристан Тцара
More information about the Users
mailing list