[ClusterLabs] Antw: Re: Never join a list without a problem...

Wed Mar 1 09:38:08 UTC 2017

>>> Kai Dupke <kdupke at suse.com> schrieb am 01.03.2017 um 09:55 in Nachricht
<bf07f253-6d47-6194-6460-bb31ec7f531f at suse.com>:
> On 02/27/2017 02:26 PM, Jeffrey Westgate  wrote:
>> We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, 
> and we cannot set a clock by it - while the machine is 95% idle (or more 
> according to 'top'), the host load shoots up to 50 or 60%.  It takes about 20 
> minutes to peak, and another 30 to 45 minutes to come back down to baseline, 
> which is mostly 0.00.
> 
> So, you have a time window of ~1h where the system is under load, right?
> This is somewhat different to what Ulrich had, but his approach might be
> useful for you, too.
> 
> Something against running some monitoring and capturing the processes,
> process states and load say, every 5 minutes?
> 
> Of course, the peaks might correlate to something in the logs - like
> cron, logins, logrotates or whatever.

The main issue is "expected load" vs. "unexpected load". In my case the system was expected to be completely idle at night, so I had set the thresholds rather low. Other systems can use different approaches. I hope to hear what caused the problem in your case.

Ulrich