[ClusterLabs] Antw: Re: [EXT] Problem with DLM
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Jul 28 05:28:24 EDT 2022
>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 26.07.2022 um
21:36 in Nachricht
<1994685463.141245271.1658864186207.JavaMail.zimbra at helmholtz-muenchen.de>:
>
> ----- On 26 Jul, 2022, at 20:06, Ulrich Windl
> Ulrich.Windl at rz.uni-regensburg.de wrote:
>
>> Hi Bernd!
>>
>> I think the answer may be some time before the timeout was reported; maybe a
>> network issue? Or a very high load. It's hard to say from the logs...
>
> Yes, i had a high load before:
> Jul 20 00:17:42 [32512] ha-idg-1 crmd: notice:
> throttle_check_thresholds: High CPU load detected: 90.080002
> Jul 20 00:18:12 [32512] ha-idg-1 crmd: notice:
> throttle_check_thresholds: High CPU load detected: 76.169998
> Jul 20 00:18:42 [32512] ha-idg-1 crmd: notice:
> throttle_check_thresholds: High CPU load detected: 85.629997
> Jul 20 00:19:12 [32512] ha-idg-1 crmd: notice:
> throttle_check_thresholds: High CPU load detected: 70.660004
> Jul 20 00:19:42 [32512] ha-idg-1 crmd: notice:
> throttle_check_thresholds: High CPU load detected: 58.340000
> Jul 20 00:20:12 [32512] ha-idg-1 crmd: info:
> throttle_check_thresholds: Moderate CPU load detected: 48.740002
> Jul 20 00:20:12 [32512] ha-idg-1 crmd: info:
> throttle_send_command: New throttle mode: 0010 (was 0100)
> Jul 20 00:20:42 [32512] ha-idg-1 crmd: info:
> throttle_check_thresholds: Moderate CPU load detected: 41.889999
> Jul 20 00:21:12 [32512] ha-idg-1 crmd: info:
> throttle_send_command: New throttle mode: 0001 (was 0010)
> Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning:
> child_timeout_callback: dlm_monitor_30000 process (PID 11816) timed out
> Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: operation_finished:
> dlm_monitor_30000:11816 - timed out after 20000ms
> Jul 20 00:21:56 [32512] ha-idg-1 crmd: error: process_lrm_event:
> Result of monitor operation for dlm on ha-idg-1: Timed Out | call=1255
> key=dlm_monitor_30000 timeout=20000ms
> Jul 20 00:21:56 [32512] ha-idg-1 crmd: info: exec_alert_list:
> Sending resource alert via smtp_alert to informatic.idg at helmholtz-muenchen.de
> Jul 20 00:21:56 [12204] ha-idg-1 lrmd: info:
> process_lrmd_alert_exec: Executing alert smtp_alert for
> 8f934e90-12f5-4bad-b4f4-55ac933f01c6
>
> Can that interfere with DLM ?
It depends ;-)
If the CPU load is mostly user load, then (also depending on the number of CPUs you have) proably not, but if the load is I/O or system load, it could affect any pacemaker process in a bad way. I think you'll have to analyze your load; maybe adjusting timeouts.
You could use monit to examine your system load (this is just some idle VM):
status OK
monitoring status Monitored
monitoring mode active
on reboot start
load average [0.00] [0.00] [0.00]
cpu 0.2%usr 0.1%sys 0.0%nice 0.0%iowait 0.0%hardirq 0.0%softirq 0.0%steal 0.0%guest 0.0%guestnice
memory usage 442.1 MB [22.3%]
swap usage 20.5 MB [1.0%]
uptime 13d 17h 41m
boot time Thu, 14 Jul 2022 17:40:58
filedescriptors 1376 [0.7% of 198048 limit]
data collected Thu, 28 Jul 2022 11:20:41
You could configurer action scripts like this:
if loadavg (1min) per core > 4 then exec "/var/lib/monit/log-top.sh"
if loadavg (5min) per core > 2 then exec "/var/lib/monit/log-top.sh"
if loadavg (15min) per core > 1 then exec "/var/lib/monit/log-top.sh"
if memory usage > 90% for 2 cycles then exec "/var/lib/monit/log-top.sh"
if swap usage > 25% for 2 cycles then exec "/var/lib/monit/log-top.sh"
if swap usage > 50% then exec "/var/lib/monit/log-top.sh"
if cpu usage (system) > 20% for 3 cycles then exec "/var/lib/monit/log-top.sh"
if cpu usage (wait) > 80% then exec "/var/lib/monit/log-top.sh"
A possible script could be (this mess created by< myself):
#!/bin/sh
sect()
{
echo "--- $1 ---"
shift
eval "$@"
}
{
echo "========== $(/bin/date) =========="
sect 'MONIT env' 'env | grep ^MONIT_'
sect 'mpstat' /usr/bin/mpstat
sect 'vmstat' /usr/bin/vmstat
sect 'top' /usr/bin/top -b -n 1 -Hi
} >> /var/log/monit/top.log
Regards,
Ulrich
>
> Bernd
More information about the Users
mailing list