[ClusterLabs] Never join a list without a problem...

Wed Mar 1 06:33:01 EST 2017

Ferenc Wágner <wferi at niif.hu> wrote:
>Jeffrey Westgate <Jeffrey.Westgate at arkansas.gov> writes:
>
>> We use Nagios to monitor, and once every 20 to 40 hours - sometimes
>> longer, and we cannot set a clock by it - while the machine is 95%
>> idle (or more according to 'top'), the host load shoots up to 50 or
>> 60%.  It takes about 20 minutes to peak, and another 30 to 45 minutes
>> to come back down to baseline, which is mostly 0.00.  (attached
>> hostload.pdf) This happens to both machines, randomly, and is
>> concerning, as we'd like to find what's causing it and resolve it.
>
>Try running atop (http://www.atoptool.nl/).  It collects and logs
>process accounting info, allowing you to step back in time and check
>resource usage in the past.

Nice, I didn't know atop could also log the collected data for future
analysis.

If you want to capture even more detail, sysdig is superb:

    http://www.sysdig.org/