[ClusterLabs] Never join a list without a problem...
Adam Spiers
aspiers at suse.com
Wed Mar 1 06:33:01 EST 2017
Ferenc Wágner <wferi at niif.hu> wrote:
>Jeffrey Westgate <Jeffrey.Westgate at arkansas.gov> writes:
>
>> We use Nagios to monitor, and once every 20 to 40 hours - sometimes
>> longer, and we cannot set a clock by it - while the machine is 95%
>> idle (or more according to 'top'), the host load shoots up to 50 or
>> 60%. It takes about 20 minutes to peak, and another 30 to 45 minutes
>> to come back down to baseline, which is mostly 0.00. (attached
>> hostload.pdf) This happens to both machines, randomly, and is
>> concerning, as we'd like to find what's causing it and resolve it.
>
>Try running atop (http://www.atoptool.nl/). It collects and logs
>process accounting info, allowing you to step back in time and check
>resource usage in the past.
Nice, I didn't know atop could also log the collected data for future
analysis.
If you want to capture even more detail, sysdig is superb:
http://www.sysdig.org/
More information about the Users
mailing list