[ClusterLabs] Redundant entries in log

Tue Dec 5 12:38:31 EST 2023

On Tue, 2023-12-05 at 17:21 +0000, Jean-Baptiste Skutnik wrote:
> Hi,
> 
> It was indeed a configuration of 1m on the recheck interval that
> triggered the transitions.
> 
> Could you elaborate on why this is not relevant anymore ? I am
> training
> on the HA stack and if there are mechanisms to detect failure more
> advanced than a recheck I would be interested in what to look for in
> the documentation.

Hi,

The recheck interval has nothing to do with detecting resource failures
-- that is done per-resource via the configured monitor operation
interval.

In the past, time-based configuration such as failure timeouts and
date/time-based rules were only guaranteed to be checked as often as
the recheck interval. That was the most common reason why people
lowered it. However, since the 2.0.3 release, these are checked at the
exact appropriate time, so the recheck interval is no longer relevant
for these.

The recheck interval is still useful in two situations: evaluation of
rules using the (cron-like) date_spec element is still only guaranteed
to occur this often; and if there are scheduler bugs resulting in an
incompletely scheduled transition that can be corrected with a new
transition, this will be the maximum time until that happens.

> 
> Cheers,
> 
> JB
> 
> > On Nov 29, 2023, at 18:52, Ken Gaillot <kgaillot at redhat.com> wrote:
> > 
> > Hi,
> > 
> > Something is triggering a new transition. The most likely candidate
> > is
> > a low value for cluster-recheck-interval.
> > 
> > Many years ago, a low cluster-recheck-interval was necessary to
> > make
> > certain things like failure-timeout more timely, but that has not
> > been
> > the case in a long time. It should be left to default (15 minutes)
> > in
> > the vast majority of cases. (A new transition will still occur on
> > that
> > schedule, but that's reasonable.)
> > 
> > On Wed, 2023-11-29 at 10:05 +0000, Jean-Baptiste Skutnik via Users
> > wrote:
> > > Hello all,
> > > 
> > > I am managing a cluster using pacemaker for high availability. I
> > > am
> > > parsing the logs for relevant information on the cluster health
> > > and
> > > the logs are full of the following:
> > > 
> > > ```
> > > Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_IDLE -> S_POLICY_ENGINE
> > > Nov 29 09:17:41 esvm2 pacemaker-schedulerd[2892]:  notice:
> > > Calculated
> > > transition 8629, saving inputs in /var/lib/pacemaker/pengine/pe-
> > > input-250.bz2
> > > Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice:
> > > Transition
> > > 8629 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > > Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> > > Nov 29 09:17:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_TRANSITION_ENGINE -> S_IDLE
> > > Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_IDLE -> S_POLICY_ENGINE
> > > Nov 29 09:18:41 esvm2 pacemaker-schedulerd[2892]:  notice:
> > > Calculated
> > > transition 8630, saving inputs in /var/lib/pacemaker/pengine/pe-
> > > input-250.bz2
> > > Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice:
> > > Transition
> > > 8630 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > > Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> > > Nov 29 09:18:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_TRANSITION_ENGINE -> S_IDLE
> > > Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition S_IDLE -> S_POLICY_ENGINE
> > > Nov 29 09:19:41 esvm2 pacemaker-schedulerd[2892]:  notice:
> > > Calculated
> > > transition 8631, saving inputs in /var/lib/pacemaker/pengine/pe-
> > > input-250.bz2
> > > Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice:
> > > Transition
> > > 8631 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > > Source=/var/lib/pacemaker/pengine/pe-input-250.bz2): Complete
> > > Nov 29 09:19:41 esvm2 pacemaker-controld[2893]:  notice: State
> > > transition
> > > ...
> > > ```
> > > 
> > > The transition IDs seem to differ however the file containing the
> > > transition data stays the same, implying that the transition does
> > > not
> > > affect the cluster. (/var/lib/pacemaker/pengine/pe-input-250.bz2)
> > > 
> > > I noticed the option to restrict the logging to higher levels
> > > however
> > > some valuable information is logged under the `notice` level and
> > > I
> > > would like to keep it in the logs.
> > > 
> > > Please let me know if I am doing something wrong or if there is a
> > > way
> > > to turn off these messages.
> > > 
> > > Thanks,
> > > 
> > > Jean-Baptiste Skutnik
> > > _______________________________________________
> > 
> > -- 
> > Ken Gaillot <kgaillot at redhat.com>
> > 
-- 
Ken Gaillot <kgaillot at redhat.com>