[ClusterLabs] Antw: Antw: Re: Antw: [EXT] VIP monitor Timed Out

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Jul 20 06:35:54 EDT 2021


>>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> schrieb am 20.07.2021
um
12:01 in Nachricht <60F69EFB020000A1000427F7 at gwsmtp.uni-regensburg.de>:
> Hi!
> 
> In the commands traced, no command (that is the monitor) too more than 3

s/too/took/ # Sorry!

> seconds, so that either is *not* the timeout, or pacemaker got
significantly
> delayed.
> One reason I could imagine is a "read stall". For example you could trigger
> such if you rapidly fill your block cache with dirty blocks (to be written)

> and
> some read request would have to wait for buffers (to become written, thus
> available). However if you are writing like mad, available read buffers 
> might
> be rare.
> Fortunately you can tune the kernel to limit the amount of dirty buffers.
> I'm not saying that is your problem, but the trace looks OK.
> 
> Regards,
> Ulrich
> 
>>>> PASERO Florent <florent.pasero at externe.bnpparibas.com> schrieb am
> 20.07.2021 um
> 11:51 in Nachricht
>
<PR0P264MB2139D47D0ACF66A81F3DCD37B4E29 at PR0P264MB2139.FRAP264.PROD.OUTLOOK.CO

> M>:
> 
>> Hi,
>> 
>> Once or twice a week, we have a 'Timed out' on our VIP.
>> 
>> The last :
>> Cluster Summary:
>>   * Stack: corosync
>>   * Current DC: server07 (version 2.0.5-9.el8_4.1-ba59be7122) - partition 
>> with quorum
>>   * Last updated: Tue Jul 20 11:39:22 2021
>>   * Last change:  Mon Jul  5 09:42:14 2021 by hacluster via cibadmin on 
>> server06
>>   * 2 nodes configured
>>   * 2 resource instances configured
>> 
>> Node List:
>>   * Online: [ server06 server07 ]
>> 
>> Active Resources:
>>   * Resource Group: zbx_prod_Web_Core:
>>     * VIP       (ocf::heartbeat:IPaddr2):        Started server07
>>     * ZabbixServer      (systemd:zabbix-server):         Started server07
>> 
>> Failed Resource Actions:
>>   * VIP_monitor_10000 on server07 'error' (1): call=123, status='Timed
Out',
> 
>> exitreason='', last-rc-change='2021-07-19 15:02:27 +02:00', queued=0ms, 
>> exec=0ms
>> 
>> Any idea ? because nothing very revealing in the following log files.
>> 
>> Here are the monitoring files just before and just after the time out.
>> 
>> VIP.monitor.2021-07-19.15:01:27 :
>> +++ 15:01:27: ocf_start_trace:999: echo
>> +++ 15:01:27: ocf_start_trace:999: printenv
>> +++ 15:01:27: ocf_start_trace:999: sort
>> ++ 15:01:27: ocf_start_trace:999: env='
>> HA_LOGFACILITY=daemon
>> HA_LOGFILE=/var/log/pacemaker/pacemaker.log
>> HA_cluster_type=corosync
>> HA_debug=0
>> HA_logfacility=daemon
>> HA_logfile=/var/log/pacemaker/pacemaker.log
>> HA_mcp=true
>> HA_quorum_type=corosync
>> INVOCATION_ID=5cd03e610fbf4a9bb3ffe2b30e1fb5d4
>> JOURNAL_STREAM=9:4433035
>> LC_ALL=C
>> OCF_EXIT_REASON_PREFIX=ocf-exit-reason:
>> OCF_RA_VERSION_MAJOR=1
>> OCF_RA_VERSION_MINOR=0
>> OCF_RESKEY_CRM_meta_interval=10000
>> OCF_RESKEY_CRM_meta_name=monitor
>> OCF_RESKEY_CRM_meta_on_node=server07
>> OCF_RESKEY_CRM_meta_on_node_uuid=2
>> OCF_RESKEY_CRM_meta_timeout=20000
>> OCF_RESKEY_crm_feature_set=3.7.1
>> OCF_RESKEY_ip=10.0.0.67
>> OCF_RESKEY_monitor_retries=10
>> OCF_RESKEY_trace_file=/apps/Zabbix_Log/Core
>> OCF_RESKEY_trace_ra=1
>> OCF_RESOURCE_INSTANCE=VIP
>> OCF_RESOURCE_PROVIDER=heartbeat
>> OCF_RESOURCE_TYPE=IPaddr2
>> OCF_ROOT=/usr/lib/ocf
>>
>
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:
>> /usr/bin:/usr/ucb
>> PCMK_cluster_type=corosync
>> PCMK_debug=0
>> PCMK_logfacility=daemon
>> PCMK_logfile=/var/log/pacemaker/pacemaker.log
>> PCMK_mcp=true
>> PCMK_quorum_type=corosync
>> PCMK_service=pacemaker-execd
>> PCMK_watchdog=false
>> PWD=/var/lib/pacemaker/cores
>> SHLVL=1
>> VALGRIND_OPTS=--leak-check=full --trace-children=no --vgdb=no 
>> --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p 
>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions 
>> --gen-suppressions=all
>> _=/usr/bin/printenv
>>
>
__OCF_TRC_DEST=/var/lib/heartbeat/trace_ra/IPaddr2/VIP.monitor.2021-07-19.15
>> :01:27
>> __OCF_TRC_MANAGE=1'
>> ++ 15:01:27: source:1053: ocf_is_true ''
>> ++ 15:01:27: ocf_is_true:103: case "$1" in
>> ++ 15:01:27: ocf_is_true:103: case "$1" in
>> ++ 15:01:27: ocf_is_true:105: false
>> + 15:01:27: main:69: . /usr/lib/ocf/lib/heartbeat/findif.sh
>> + 15:01:27: main:72: OCF_RESKEY_lvs_support_default=false
>> + 15:01:27: main:73: OCF_RESKEY_lvs_ipv6_addrlabel_default=false
>> + 15:01:27: main:74: OCF_RESKEY_lvs_ipv6_addrlabel_value_default=99
>> + 15:01:27: main:75: OCF_RESKEY_clusterip_hash_default=sourceip-sourceport
>> + 15:01:27: main:76: OCF_RESKEY_unique_clone_address_default=false
>> + 15:01:27: main:77: OCF_RESKEY_arp_interval_default=200
>> + 15:01:27: main:78: OCF_RESKEY_arp_count_default=5
>> + 15:01:27: main:79: OCF_RESKEY_arp_count_refresh_default=0
>> + 15:01:27: main:80: OCF_RESKEY_arp_bg_default=true
>> + 15:01:27: main:81: OCF_RESKEY_run_arping_default=false
>> + 15:01:27: main:82: OCF_RESKEY_noprefixroute_default=false
>> + 15:01:27: main:83: OCF_RESKEY_preferred_lft_default=forever
>> + 15:01:27: main:84: OCF_RESKEY_monitor_retries=1
>> + 15:01:27: main:86: : false
>> + 15:01:27: main:87: : false
>> + 15:01:27: main:88: : 99
>> + 15:01:27: main:89: : sourceip-sourceport
>> + 15:01:27: main:90: : false
>> + 15:01:27: main:91: : 200
>> + 15:01:27: main:92: : 5
>> + 15:01:27: main:93: : 0
>> + 15:01:27: main:94: : true
>> + 15:01:27: main:95: : false
>> + 15:01:27: main:96: : false
>> + 15:01:27: main:97: : forever
>> + 15:01:27: main:98: : 1
>> + 15:01:27: main:101: SENDARP=/usr/libexec/heartbeat/send_arp
>> + 15:01:27: main:102: SENDUA=/usr/libexec/heartbeat/send_ua
>> + 15:01:27: main:103: FINDIF=findif
>> + 15:01:27: main:104: VLDIR=/run/resource-agents
>> + 15:01:27: main:105: SENDARPPIDDIR=/run/resource-agents
>> + 15:01:27: main:106: 
>> CIP_lockfile=/run/resource-agents/IPaddr2-CIP-10.0.0.67
>> + 15:01:27: main:108: IPADDR2_CIP_IPTABLES=iptables
>> + 15:01:27: main:1261: ocf_is_true false
>> + 15:01:27: ocf_is_true:103: case "$1" in
>> + 15:01:27: ocf_is_true:105: false
>> + 15:01:27: main:1268: case $__OCF_ACTION in
>> + 15:01:27: main:1276: ip_validate
>> + 15:01:27: ip_validate:1184: check_binary ip
>> + 15:01:27: check_binary:57: have_binary ip
>> + 15:01:27: have_binary:69: '[' '' = 1 ']'
>> ++ 15:01:27: have_binary:72: echo ip
>> ++ 15:01:27: have_binary:72: sed -e 's/ -.*//'
>> + 15:01:27: have_binary:72: local bin=ip
>> ++ 15:01:27: have_binary:73: which ip
>> + 15:01:27: have_binary:73: test -x /usr/sbin/ip
>> + 15:01:27: ip_validate:1185: IP_CIP=
>> + 15:01:27: ip_validate:1187: ip_init
>> + 15:01:27: ip_init:423: local rc
>> ++ 15:01:27: ip_init:425: uname -s
>> + 15:01:27: ip_init:425: '[' XLinux '!=' XLinux ']'
>> + 15:01:27: ip_init:430: '[' X10.0.0.67 = X ']'
>> + 15:01:27: ip_init:436: case $__OCF_ACTION in
>> + 15:01:27: ip_init:438: true
>> + 15:01:27: ip_init:441: : 'YAY!'
>> + 15:01:27: ip_init:447: BASEIP=10.0.0.67
>> + 15:01:27: ip_init:448: BRDCAST=
>> + 15:01:27: ip_init:449: NIC=
>> + 15:01:27: ip_init:453: '[' '!' -z '' -a -z '' ']'
>> + 15:01:27: ip_init:458: NETMASK=
>> + 15:01:27: ip_init:459: IFLABEL=
>> + 15:01:27: ip_init:460: IF_MAC=
>> + 15:01:27: ip_init:462: IP_INC_GLOBAL=1
>> ++ 15:01:27: ip_init:463: expr 0 + 1
>> + 15:01:27: ip_init:463: IP_INC_NO=1
>> + 15:01:27: ip_init:465: ocf_is_true false
>> + 15:01:27: ocf_is_true:103: case "$1" in
>> + 15:01:27: ocf_is_true:105: false
>> + 15:01:27: ip_init:470: ocf_is_decimal 1
>> + 15:01:27: ocf_is_decimal:94: case "$1" in
>> + 15:01:27: ocf_is_decimal:98: true
>> + 15:01:27: ip_init:470: '[' 1 -gt 0 ']'
>> + 15:01:27: ip_init:471: :
>> + 15:01:27: ip_init:477: echo 10.0.0.67
>> + 15:01:27: ip_init:477: grep -qs :
>> + 15:01:27: ip_init:478: '[' 1 -ne 0 ']'
>> + 15:01:27: ip_init:479: FAMILY=inet
>> + 15:01:27: ip_init:480: ocf_is_true false
>> + 15:01:27: ocf_is_true:103: case "$1" in
>> + 15:01:27: ocf_is_true:105: false
>> + 15:01:27: ip_init:507: case $NIC in
>> + 15:01:27: ip_init:507: case $NIC in
>> ++ 15:01:27: ip_init:518: findif
>> ++ 15:01:27: findif:197: local match=10.0.0.67
>> ++ 15:01:27: findif:198: local family
>> ++ 15:01:27: findif:199: local scope
>> ++ 15:01:27: findif:200: local nic=
>> ++ 15:01:27: findif:201: local netmask=
>> ++ 15:01:27: findif:202: local brdcast=
>> ++ 15:01:27: findif:204: echo 10.0.0.67
>> ++ 15:01:27: findif:204: grep -qs :
>> ++ 15:01:27: findif:205: '[' 1 = 0 ']'
>> ++ 15:01:27: findif:208: family=inet
>> ++ 15:01:27: findif:209: scope='scope link'
>> ++ 15:01:27: findif:211: findif_check_params inet
>> ++ 15:01:27: findif_check_params:123: local family=inet
>> ++ 15:01:27: findif_check_params:124: local match=10.0.0.67
>> ++ 15:01:27: findif_check_params:125: local nic=
>> ++ 15:01:27: findif_check_params:127: netmask=
>> ++ 15:01:27: findif_check_params:128: local brdcast=
>> ++ 15:01:27: findif_check_params:129: local errmsg
>> ++ 15:01:27: findif_check_params:131: maybe_convert_dotted_quad_to_cidr
>> ++ 15:01:27: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
>> ++ 15:01:27: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
>> ++ 15:01:27: maybe_convert_dotted_quad_to_cidr:68: return
>> ++ 15:01:27: findif_check_params:135: case $__OCF_ACTION in
>> ++ 15:01:27: findif_check_params:135: case $__OCF_ACTION in
>> ++ 15:01:27: findif_check_params:137: return 0
>> ++ 15:01:27: findif:213: '[' -n '' ']'
>> ++ 15:01:27: findif:216: '[' -n '' ']'
>> +++ 15:01:27: findif:220: ip -o -f inet route list match 10.0.0.67 scope 
>> link
>> +++ 15:01:27: findif:220: awk 'BEGIN{best=0} /\// { mask=$1; sub(".*/",
"",
> 
>> mask); if( int(mask)>=best ) { best=int(mask); best_ln=$0; } } END{print 
>> best_ln}'
>> ++ 15:01:27: findif:220: set -- 10.0.0.0/24 dev team0 proto kernel src 
>> 10.0.0.66 metric 350
>> ++ 15:01:27: findif:222: '[' 9 = 0 ']'
>> ++ 15:01:27: findif:229: '[' -z '' -o -z '' ']'
>> ++ 15:01:27: findif:230: '[' 9 = 0 ']'
>> ++ 15:01:27: findif:234: case $1 in
>> ++ 15:01:27: findif:234: case $1 in
>> ++ 15:01:27: findif:235: : OK
>> ++ 15:01:27: findif:243: '[' -z '' ']'
>> ++ 15:01:27: findif:243: nic=team0
>> ++ 15:01:27: findif:244: '[' -z '' ']'
>> ++ 15:01:27: findif:244: netmask=24
>> ++ 15:01:27: findif:245: '[' inet = inet ']'
>> ++ 15:01:27: findif:246: '[' -z '' ']'
>> ++ 15:01:27: findif:247: '[' -n 10.0.0.66 ']'
>> +++ 15:01:27: findif:248: ip -o -f inet addr show
>> +++ 15:01:27: findif:248: grep 10.0.0.66
>> ++ 15:01:27: findif:248: set -- 5: team0 inet 10.0.0.66/24 brd 10.0.0.255 
>> scope global noprefixroute 'team0\' valid_lft forever preferred_lft
forever
>> ++ 15:01:27: findif:249: '[' brd = brd ']'
>> ++ 15:01:27: findif:249: brdcast=10.0.0.255
>> ++ 15:01:27: findif:258: echo 'team0 netmask 24 broadcast 10.0.0.255'
>> ++ 15:01:27: findif:259: return 0
>> + 15:01:27: ip_init:507: case $NIC in
>> + 15:01:27: ip_init:518: NICINFO='team0 netmask 24 broadcast 10.0.0.255'
>> + 15:01:27: ip_init:519: rc=0
>> + 15:01:27: ip_init:521: '[' 0 -eq 0 ']'
>> ++ 15:01:27: ip_init:523: echo 'team0 netmask 24 broadcast 10.0.0.255'
>> ++ 15:01:27: ip_init:523: sed -e 's/netmask\ //;s/broadcast\ //'
>> + 15:01:27: ip_init:523: NICINFO='team0 24 10.0.0.255'
>> ++ 15:01:27: ip_init:524: echo 'team0 24 10.0.0.255'
>> ++ 15:01:27: ip_init:524: cut '-d ' -f1
>> + 15:01:27: ip_init:524: NIC=team0
>> ++ 15:01:27: ip_init:525: echo 'team0 24 10.0.0.255'
>> ++ 15:01:27: ip_init:525: cut '-d ' -f2
>> + 15:01:27: ip_init:525: NETMASK=24
>> ++ 15:01:27: ip_init:526: echo 'team0 24 10.0.0.255'
>> ++ 15:01:27: ip_init:526: cut '-d ' -f3
>> + 15:01:27: ip_init:526: BRDCAST=10.0.0.255
>> + 15:01:27: ip_init:541: 
>> SENDARPPIDFILE=/run/resource-agents/send_arp-10.0.0.67
>> + 15:01:27: ip_init:543: '[' -n '' ']'
>> + 15:01:27: ip_init:551: '[' 1 -gt 1 ']'
>> + 15:01:27: ip_validate:1189: set_send_arp_program
>> + 15:01:27: set_send_arp_program:1149: ARP_SENDER=send_arp
>> + 15:01:27: set_send_arp_program:1150: '[' -n '' ']'
>> + 15:01:27: set_send_arp_program:1171: is_infiniband
>> + 15:01:27: is_infiniband:767: grep link/infiniband
>> + 15:01:27: is_infiniband:767: ip link show team0
>> + 15:01:27: ip_validate:1191: '[' -n '' ']'
>> + 15:01:27: ip_validate:1202: ocf_is_true false
>> + 15:01:27: ocf_is_true:103: case "$1" in
>> + 15:01:27: ocf_is_true:105: false
>> + 15:01:27: ip_validate:1208: ocf_is_decimal 200
>> + 15:01:27: ocf_is_decimal:94: case "$1" in
>> + 15:01:27: ocf_is_decimal:98: true
>> + 15:01:27: ip_validate:1208: '[' 200 -gt 0 ']'
>> + 15:01:27: ip_validate:1209: :
>> + 15:01:27: ip_validate:1215: ocf_is_decimal 5
>> + 15:01:27: ocf_is_decimal:94: case "$1" in
>> + 15:01:27: ocf_is_decimal:98: true
>> + 15:01:28: ip_validate:1215: '[' 5 -gt 0 ']'
>> + 15:01:28: ip_validate:1216: :
>> + 15:01:28: ip_validate:1222: '[' -z forever ']'
>> + 15:01:28: ip_validate:1227: '[' -n '' ']'
>> + 15:01:28: main:1278: case $__OCF_ACTION in
>> + 15:01:28: main:1292: ip_monitor
>> ++ 15:01:28: ip_monitor:1131: ip_served
>> ++ 15:01:28: ip_served:925: '[' -z team0 ']'
>> +++ 15:01:28: ip_served:930: find_interface 10.0.0.67 24
>> +++ 15:01:28: find_interface:579: local ipaddr=10.0.0.67
>> +++ 15:01:28: find_interface:580: local netmask=24
>> +++ 15:01:28: find_interface:581: local iface=
>> ++++ 15:01:28: find_interface:586: seq 1 1
>> +++ 15:01:28: find_interface:586: for i in $(seq 1 
>> $OCF_RESKEY_monitor_retries)
>> ++++ 15:01:28: find_interface:590: ip -o -f inet addr show
>> ++++ 15:01:28: find_interface:590: cut -d ' ' -f2
>> ++++ 15:01:28: find_interface:590: grep '\ 10.0.0.67/24'
>> ++++ 15:01:28: find_interface:590: grep -v '^ipsec[0-9][0-9]*$'
>> +++ 15:01:28: find_interface:590: iface=team0
>> +++ 15:01:28: find_interface:592: '[' -n team0 ']'
>> +++ 15:01:28: find_interface:593: break
>> +++ 15:01:28: find_interface:601: echo team0
>> +++ 15:01:28: find_interface:602: return 0
>> ++ 15:01:28: ip_served:930: cur_nic=team0
>> ++ 15:01:28: ip_served:932: '[' -z team0 ']'
>> ++ 15:01:28: ip_served:937: '[' -z '' ']'
>> ++ 15:01:28: ip_served:938: for i in $cur_nic
>> ++ 15:01:28: ip_served:940: '[' team0 = team0 ']'
>> ++ 15:01:28: ip_served:941: echo ok
>> ++ 15:01:28: ip_served:942: return 0
>> + 15:01:28: ip_monitor:1131: local ip_status=ok
>> + 15:01:28: ip_monitor:1132: case $ip_status in
>> + 15:01:28: ip_monitor:1134: run_arp_sender refresh
>> + 15:01:28: run_arp_sender:844: '[' xrefresh = xrefresh ']'
>> + 15:01:28: run_arp_sender:845: ARP_COUNT=0
>> + 15:01:28: run_arp_sender:846: LOGLEVEL=debug
>> + 15:01:28: run_arp_sender:851: '[' 0 -eq 0 ']'
>> + 15:01:28: run_arp_sender:852: return
>> + 15:01:28: ip_monitor:1135: return 0
>> 
>> VIP.monitor.2021-07-19.15:03:14 :
>> +++ 15:03:14: ocf_start_trace:999: echo
>> +++ 15:03:14: ocf_start_trace:999: printenv
>> +++ 15:03:14: ocf_start_trace:999: sort
>> ++ 15:03:14: ocf_start_trace:999: env='
>> HA_LOGFACILITY=daemon
>> HA_LOGFILE=/var/log/pacemaker/pacemaker.log
>> HA_cluster_type=corosync
>> HA_debug=0
>> HA_logfacility=daemon
>> HA_logfile=/var/log/pacemaker/pacemaker.log
>> HA_mcp=true
>> HA_quorum_type=corosync
>> INVOCATION_ID=5cd03e610fbf4a9bb3ffe2b30e1fb5d4
>> JOURNAL_STREAM=9:4433035
>> LC_ALL=C
>> OCF_EXIT_REASON_PREFIX=ocf-exit-reason:
>> OCF_RA_VERSION_MAJOR=1
>> OCF_RA_VERSION_MINOR=0
>> OCF_RESKEY_CRM_meta_interval=10000
>> OCF_RESKEY_CRM_meta_name=monitor
>> OCF_RESKEY_CRM_meta_on_node=server07
>> OCF_RESKEY_CRM_meta_on_node_uuid=2
>> OCF_RESKEY_CRM_meta_timeout=20000
>> OCF_RESKEY_crm_feature_set=3.7.1
>> OCF_RESKEY_ip=10.0.0.67
>> OCF_RESKEY_monitor_retries=10
>> OCF_RESKEY_trace_file=/apps/Zabbix_Log/Core
>> OCF_RESKEY_trace_ra=1
>> OCF_RESOURCE_INSTANCE=VIP
>> OCF_RESOURCE_PROVIDER=heartbeat
>> OCF_RESOURCE_TYPE=IPaddr2
>> OCF_ROOT=/usr/lib/ocf
>>
>
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:
>> /usr/bin:/usr/ucb
>> PCMK_cluster_type=corosync
>> PCMK_debug=0
>> PCMK_logfacility=daemon
>> PCMK_logfile=/var/log/pacemaker/pacemaker.log
>> PCMK_mcp=true
>> PCMK_quorum_type=corosync
>> PCMK_service=pacemaker-execd
>> PCMK_watchdog=false
>> PWD=/var/lib/pacemaker/cores
>> SHLVL=1
>> VALGRIND_OPTS=--leak-check=full --trace-children=no --vgdb=no 
>> --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p 
>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions 
>> --gen-suppressions=all
>> _=/usr/bin/printenv
>>
>
__OCF_TRC_DEST=/var/lib/heartbeat/trace_ra/IPaddr2/VIP.monitor.2021-07-19.15
>> :03:14
>> __OCF_TRC_MANAGE=1'
>> ++ 15:03:14: source:1053: ocf_is_true ''
>> ++ 15:03:14: ocf_is_true:103: case "$1" in
>> ++ 15:03:14: ocf_is_true:103: case "$1" in
>> ++ 15:03:14: ocf_is_true:105: false
>> + 15:03:14: main:69: . /usr/lib/ocf/lib/heartbeat/findif.sh
>> + 15:03:14: main:72: OCF_RESKEY_lvs_support_default=false
>> + 15:03:14: main:73: OCF_RESKEY_lvs_ipv6_addrlabel_default=false
>> + 15:03:14: main:74: OCF_RESKEY_lvs_ipv6_addrlabel_value_default=99
>> + 15:03:14: main:75: OCF_RESKEY_clusterip_hash_default=sourceip-sourceport
>> + 15:03:14: main:76: OCF_RESKEY_unique_clone_address_default=false
>> + 15:03:14: main:77: OCF_RESKEY_arp_interval_default=200
>> + 15:03:14: main:78: OCF_RESKEY_arp_count_default=5
>> + 15:03:14: main:79: OCF_RESKEY_arp_count_refresh_default=0
>> + 15:03:14: main:80: OCF_RESKEY_arp_bg_default=true
>> + 15:03:14: main:81: OCF_RESKEY_run_arping_default=false
>> + 15:03:14: main:82: OCF_RESKEY_noprefixroute_default=false
>> + 15:03:14: main:83: OCF_RESKEY_preferred_lft_default=forever
>> + 15:03:14: main:84: OCF_RESKEY_monitor_retries=1
>> + 15:03:14: main:86: : false
>> + 15:03:14: main:87: : false
>> + 15:03:14: main:88: : 99
>> + 15:03:14: main:89: : sourceip-sourceport
>> + 15:03:14: main:90: : false
>> + 15:03:14: main:91: : 200
>> + 15:03:14: main:92: : 5
>> + 15:03:14: main:93: : 0
>> + 15:03:14: main:94: : true
>> + 15:03:14: main:95: : false
>> + 15:03:14: main:96: : false
>> + 15:03:14: main:97: : forever
>> + 15:03:14: main:98: : 1
>> + 15:03:14: main:101: SENDARP=/usr/libexec/heartbeat/send_arp
>> + 15:03:14: main:102: SENDUA=/usr/libexec/heartbeat/send_ua
>> + 15:03:14: main:103: FINDIF=findif
>> + 15:03:14: main:104: VLDIR=/run/resource-agents
>> + 15:03:14: main:105: SENDARPPIDDIR=/run/resource-agents
>> + 15:03:14: main:106: 
>> CIP_lockfile=/run/resource-agents/IPaddr2-CIP-10.0.0.67
>> + 15:03:14: main:108: IPADDR2_CIP_IPTABLES=iptables
>> + 15:03:14: main:1261: ocf_is_true false
>> + 15:03:14: ocf_is_true:103: case "$1" in
>> + 15:03:14: ocf_is_true:105: false
>> + 15:03:14: main:1268: case $__OCF_ACTION in
>> + 15:03:14: main:1276: ip_validate
>> + 15:03:14: ip_validate:1184: check_binary ip
>> + 15:03:14: check_binary:57: have_binary ip
>> + 15:03:14: have_binary:69: '[' '' = 1 ']'
>> ++ 15:03:14: have_binary:72: echo ip
>> ++ 15:03:14: have_binary:72: sed -e 's/ -.*//'
>> + 15:03:14: have_binary:72: local bin=ip
>> ++ 15:03:14: have_binary:73: which ip
>> + 15:03:14: have_binary:73: test -x /usr/sbin/ip
>> + 15:03:14: ip_validate:1185: IP_CIP=
>> + 15:03:14: ip_validate:1187: ip_init
>> + 15:03:14: ip_init:423: local rc
>> ++ 15:03:14: ip_init:425: uname -s
>> + 15:03:14: ip_init:425: '[' XLinux '!=' XLinux ']'
>> + 15:03:14: ip_init:430: '[' X10.0.0.67 = X ']'
>> + 15:03:14: ip_init:436: case $__OCF_ACTION in
>> + 15:03:14: ip_init:438: true
>> + 15:03:14: ip_init:441: : 'YAY!'
>> + 15:03:14: ip_init:447: BASEIP=10.0.0.67
>> + 15:03:14: ip_init:448: BRDCAST=
>> + 15:03:14: ip_init:449: NIC=
>> + 15:03:14: ip_init:453: '[' '!' -z '' -a -z '' ']'
>> + 15:03:14: ip_init:458: NETMASK=
>> + 15:03:14: ip_init:459: IFLABEL=
>> + 15:03:14: ip_init:460: IF_MAC=
>> + 15:03:14: ip_init:462: IP_INC_GLOBAL=1
>> ++ 15:03:14: ip_init:463: expr 0 + 1
>> + 15:03:14: ip_init:463: IP_INC_NO=1
>> + 15:03:14: ip_init:465: ocf_is_true false
>> + 15:03:14: ocf_is_true:103: case "$1" in
>> + 15:03:14: ocf_is_true:105: false
>> + 15:03:14: ip_init:470: ocf_is_decimal 1
>> + 15:03:14: ocf_is_decimal:94: case "$1" in
>> + 15:03:14: ocf_is_decimal:98: true
>> + 15:03:14: ip_init:470: '[' 1 -gt 0 ']'
>> + 15:03:14: ip_init:471: :
>> + 15:03:14: ip_init:477: echo 10.0.0.67
>> + 15:03:14: ip_init:477: grep -qs :
>> + 15:03:14: ip_init:478: '[' 1 -ne 0 ']'
>> + 15:03:14: ip_init:479: FAMILY=inet
>> + 15:03:14: ip_init:480: ocf_is_true false
>> + 15:03:14: ocf_is_true:103: case "$1" in
>> + 15:03:14: ocf_is_true:105: false
>> + 15:03:14: ip_init:507: case $NIC in
>> + 15:03:14: ip_init:507: case $NIC in
>> ++ 15:03:14: ip_init:518: findif
>> ++ 15:03:14: findif:197: local match=10.0.0.67
>> ++ 15:03:14: findif:198: local family
>> ++ 15:03:14: findif:199: local scope
>> ++ 15:03:14: findif:200: local nic=
>> ++ 15:03:14: findif:201: local netmask=
>> ++ 15:03:14: findif:202: local brdcast=
>> ++ 15:03:14: findif:204: echo 10.0.0.67
>> ++ 15:03:14: findif:204: grep -qs :
>> ++ 15:03:14: findif:205: '[' 1 = 0 ']'
>> ++ 15:03:14: findif:208: family=inet
>> ++ 15:03:14: findif:209: scope='scope link'
>> ++ 15:03:14: findif:211: findif_check_params inet
>> ++ 15:03:14: findif_check_params:123: local family=inet
>> ++ 15:03:14: findif_check_params:124: local match=10.0.0.67
>> ++ 15:03:14: findif_check_params:125: local nic=
>> ++ 15:03:14: findif_check_params:127: netmask=
>> ++ 15:03:14: findif_check_params:128: local brdcast=
>> ++ 15:03:14: findif_check_params:129: local errmsg
>> ++ 15:03:14: findif_check_params:131: maybe_convert_dotted_quad_to_cidr
>> ++ 15:03:14: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
>> ++ 15:03:14: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
>> ++ 15:03:14: maybe_convert_dotted_quad_to_cidr:68: return
>> ++ 15:03:14: findif_check_params:135: case $__OCF_ACTION in
>> ++ 15:03:14: findif_check_params:135: case $__OCF_ACTION in
>> ++ 15:03:14: findif_check_params:137: return 0
>> ++ 15:03:14: findif:213: '[' -n '' ']'
>> ++ 15:03:14: findif:216: '[' -n '' ']'
>> +++ 15:03:14: findif:220: ip -o -f inet route list match 10.0.0.67 scope 
>> link
>> +++ 15:03:14: findif:220: awk 'BEGIN{best=0} /\// { mask=$1; sub(".*/",
"",
> 
>> mask); if( int(mask)>=best ) { best=int(mask); best_ln=$0; } } END{print 
>> best_ln}'
>> ++ 15:03:14: findif:220: set -- 10.0.0.0/24 dev team0 proto kernel src 
>> 10.0.0.66 metric 350
>> ++ 15:03:14: findif:222: '[' 9 = 0 ']'
>> ++ 15:03:14: findif:229: '[' -z '' -o -z '' ']'
>> ++ 15:03:14: findif:230: '[' 9 = 0 ']'
>> ++ 15:03:14: findif:234: case $1 in
>> ++ 15:03:14: findif:234: case $1 in
>> ++ 15:03:14: findif:235: : OK
>> ++ 15:03:14: findif:243: '[' -z '' ']'
>> ++ 15:03:14: findif:243: nic=team0
>> ++ 15:03:14: findif:244: '[' -z '' ']'
>> ++ 15:03:14: findif:244: netmask=24
>> ++ 15:03:14: findif:245: '[' inet = inet ']'
>> ++ 15:03:14: findif:246: '[' -z '' ']'
>> ++ 15:03:14: findif:247: '[' -n 10.0.0.66 ']'
>> +++ 15:03:14: findif:248: ip -o -f inet addr show
>> +++ 15:03:14: findif:248: grep 10.0.0.66
>> ++ 15:03:14: findif:248: set -- 5: team0 inet 10.0.0.66/24 brd 10.0.0.255 
>> scope global noprefixroute 'team0\' valid_lft forever preferred_lft
forever
>> ++ 15:03:14: findif:249: '[' brd = brd ']'
>> ++ 15:03:14: findif:249: brdcast=10.0.0.255
>> ++ 15:03:14: findif:258: echo 'team0 netmask 24 broadcast 10.0.0.255'
>> ++ 15:03:14: findif:259: return 0
>> + 15:03:14: ip_init:507: case $NIC in
>> + 15:03:14: ip_init:518: NICINFO='team0 netmask 24 broadcast 10.0.0.255'
>> + 15:03:14: ip_init:519: rc=0
>> + 15:03:14: ip_init:521: '[' 0 -eq 0 ']'
>> ++ 15:03:14: ip_init:523: echo 'team0 netmask 24 broadcast 10.0.0.255'
>> ++ 15:03:14: ip_init:523: sed -e 's/netmask\ //;s/broadcast\ //'
>> + 15:03:14: ip_init:523: NICINFO='team0 24 10.0.0.255'
>> ++ 15:03:14: ip_init:524: echo 'team0 24 10.0.0.255'
>> ++ 15:03:14: ip_init:524: cut '-d ' -f1
>> + 15:03:14: ip_init:524: NIC=team0
>> ++ 15:03:14: ip_init:525: echo 'team0 24 10.0.0.255'
>> ++ 15:03:14: ip_init:525: cut '-d ' -f2
>> + 15:03:14: ip_init:525: NETMASK=24
>> ++ 15:03:14: ip_init:526: echo 'team0 24 10.0.0.255'
>> ++ 15:03:14: ip_init:526: cut '-d ' -f3
>> + 15:03:14: ip_init:526: BRDCAST=10.0.0.255
>> + 15:03:14: ip_init:541: 
>> SENDARPPIDFILE=/run/resource-agents/send_arp-10.0.0.67
>> + 15:03:14: ip_init:543: '[' -n '' ']'
>> + 15:03:14: ip_init:551: '[' 1 -gt 1 ']'
>> + 15:03:14: ip_validate:1189: set_send_arp_program
>> + 15:03:14: set_send_arp_program:1149: ARP_SENDER=send_arp
>> + 15:03:14: set_send_arp_program:1150: '[' -n '' ']'
>> + 15:03:14: set_send_arp_program:1171: is_infiniband
>> + 15:03:14: is_infiniband:767: ip link show team0
>> + 15:03:14: is_infiniband:767: grep link/infiniband
>> + 15:03:14: ip_validate:1191: '[' -n '' ']'
>> + 15:03:14: ip_validate:1202: ocf_is_true false
>> + 15:03:14: ocf_is_true:103: case "$1" in
>> + 15:03:14: ocf_is_true:105: false
>> + 15:03:14: ip_validate:1208: ocf_is_decimal 200
>> + 15:03:14: ocf_is_decimal:94: case "$1" in
>> + 15:03:14: ocf_is_decimal:98: true
>> + 15:03:14: ip_validate:1208: '[' 200 -gt 0 ']'
>> + 15:03:14: ip_validate:1209: :
>> + 15:03:14: ip_validate:1215: ocf_is_decimal 5
>> + 15:03:14: ocf_is_decimal:94: case "$1" in
>> + 15:03:14: ocf_is_decimal:98: true
>> + 15:03:14: ip_validate:1215: '[' 5 -gt 0 ']'
>> + 15:03:14: ip_validate:1216: :
>> + 15:03:14: ip_validate:1222: '[' -z forever ']'
>> + 15:03:14: ip_validate:1227: '[' -n '' ']'
>> + 15:03:14: main:1278: case $__OCF_ACTION in
>> + 15:03:14: main:1292: ip_monitor
>> ++ 15:03:14: ip_monitor:1131: ip_served
>> ++ 15:03:14: ip_served:925: '[' -z team0 ']'
>> +++ 15:03:14: ip_served:930: find_interface 10.0.0.67 24
>> +++ 15:03:14: find_interface:579: local ipaddr=10.0.0.67
>> +++ 15:03:14: find_interface:580: local netmask=24
>> +++ 15:03:14: find_interface:581: local iface=
>> ++++ 15:03:14: find_interface:586: seq 1 1
>> +++ 15:03:14: find_interface:586: for i in $(seq 1 
>> $OCF_RESKEY_monitor_retries)
>> ++++ 15:03:14: find_interface:590: ip -o -f inet addr show
>> ++++ 15:03:14: find_interface:590: grep '\ 10.0.0.67/24'
>> ++++ 15:03:14: find_interface:590: cut -d ' ' -f2
>> ++++ 15:03:14: find_interface:590: grep -v '^ipsec[0-9][0-9]*$'
>> +++ 15:03:14: find_interface:590: iface=team0
>> +++ 15:03:14: find_interface:592: '[' -n team0 ']'
>> +++ 15:03:14: find_interface:593: break
>> +++ 15:03:14: find_interface:601: echo team0
>> +++ 15:03:14: find_interface:602: return 0
>> ++ 15:03:14: ip_served:930: cur_nic=team0
>> ++ 15:03:14: ip_served:932: '[' -z team0 ']'
>> ++ 15:03:14: ip_served:937: '[' -z '' ']'
>> ++ 15:03:14: ip_served:938: for i in $cur_nic
>> ++ 15:03:14: ip_served:940: '[' team0 = team0 ']'
>> ++ 15:03:14: ip_served:941: echo ok
>> ++ 15:03:14: ip_served:942: return 0
>> + 15:03:14: ip_monitor:1131: local ip_status=ok
>> + 15:03:14: ip_monitor:1132: case $ip_status in
>> + 15:03:14: ip_monitor:1134: run_arp_sender refresh
>> + 15:03:14: run_arp_sender:844: '[' xrefresh = xrefresh ']'
>> + 15:03:14: run_arp_sender:845: ARP_COUNT=0
>> + 15:03:14: run_arp_sender:846: LOGLEVEL=debug
>> + 15:03:14: run_arp_sender:851: '[' 0 -eq 0 ']'
>> + 15:03:14: run_arp_sender:852: return
>> + 15:03:14: ip_monitor:1135: return 0
>> 
>> Best regards,
>> 
>> Florent
>> 
>> De : Users
> <users-bounces at clusterlabs.org<mailto:users-bounces at clusterlabs.org>>
> 
>> De la part de Klaus Wenninger
>> Envoyé : lundi 5 juillet 2021 09:14
>> À : Cluster Labs - All topics related to open-source clustering welcomed 
>> <users at clusterlabs.org<mailto:users at clusterlabs.org>>
>> Objet : Re: [ClusterLabs] Antw: [EXT] VIP monitor Timed Out
>> 
>> Using DHCP? Maybe a glitch/issue during renewal ... but elaborate
monitoring
> 
>> as suggested should show that ...
>> 
>> On Mon, Jul 5, 2021 at 9:03 AM Ulrich Windl 
>>
>
<Ulrich.Windl at rz.uni-regensburg.de<mailto:Ulrich.Windl at rz.uni-regensburg.de>>
> 
>> wrote:
>> Hi!
>> 
>> See "ip_served" and "find_interface" (essentially "$IP2UTIL -o -f $FAMILY 
>> addr
>> show") in the RA.
>> Basically it searches _all_ interfaces for $ipaddr/$netmask to locate the
>> interface when it could also examine the interface and look at the
address.
>> For many interfaces it could make a difference performance-wise IMHO.
>> Maybe so a periodic sampling how long the corresponding command takes for 
>> your
>> setup.
>> If it's not a timing issue, the interface may actually be gone
temporarily,
> 
>> or
>> the tools could have bugs.
>> 
>> Regards,
>> Ulrich
>> 
>>>>> PASERO Florent 
>>
>
<florent.pasero at externe.bnpparibas.com<mailto:florent.pasero at externe.bnpparib

> 
>> as.com>> schrieb am
>> 01.07.2021 um
>> 17:29 in Nachricht
>>
>
<PR0P264MB21394030D5C5120BB885E95DB4009 at PR0P264MB2139.FRAP264.PROD.OUTLOOK.CO

> 
>>
>
M<mailto:PR0P264MB21394030D5C5120BB885E95DB4009 at PR0P264MB2139.FRAP264.PROD.OU

> T
> 
>> LOOK.COM>>:
>> 
>>> Hi,
>>>
>>> Once or twice a week, we have a 'Timed out' on our VIP:
>>> ~$ pcs status
>>> Cluster name: zbx_pprod_Web_Core
>>> Cluster Summary:
>>>   * Stack: corosync
>>>   * Current DC: #####(version 2.0.5‑9.el8_4.1‑ba59be7122) ‑ partition
with
>>> quorum
>>>   * Last updated: Mon Jun 28 16:32:09 2021
>>>   * Last change:  Mon Jun 14 12:42:57 2021 by root via cibadmin on ######
>>>   * 2 nodes configured
>>>   * 2 resource instances configured
>>>
>>> Node List:
>>>   * Online: [ ##### #####]
>>>
>>> Full List of Resources:
>>>   * Resource Group: zbx_pprod_Web_Core:
>>>     * VIP       (ocf::heartbeat:IPaddr2):        Started #####
>>>     * ZabbixServer      (systemd:zabbix‑server):         Started ######
>>>
>>> Failed Resource Actions:
>>>   * VIP_monitor_5000 on ##### 'error' (1): call=69, status='Timed Out',
>>> exitreason='', last‑rc‑change='2021‑06‑24 14:41:57 +02:00', queued=0ms,
>> exec=0ms
>>>   * VIP_monitor_5000 on ##### 'error' (1): call=11, status='Timed Out',
>>> exitreason='', last‑rc‑change='2021‑06‑17 14:18:20 +02:00', queued=0ms,
>> exec=0ms
>>>
>>>
>>> We have the same issue on two completely different clusters.
>>>
>>> We can see in the log :
>>> Jun 24 14:41:29 ##### pacemaker‑execd     [1442069]
> (child_timeout_callback)
>> 
>>>    warning: VIP_monitor_5000 process (PID 2752333) timed out
>>> Jun 24 14:41:34 #####pacemaker‑execd     [1442069]
> (child_timeout_callback)
>> 
>>>   crit: VIP_monitor_5000 process (PID 2752333) will not die!
>>> Jun 24 14:41:57 ##### pacemaker‑execd     [1442069] (operation_finished)
>> 
>>>    warning: VIP_monitor_5000[2752333] timed out after 20000ms
>>> Jun 24 14:41:57 ##### pacemaker‑controld  [1442072] (process_lrm_event)
>>> error: Result of monitor operation for VIP on #####: Timed Out | call=69
>>> key=VIP_monitor_5000 timeout=20000ms
>>> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_process_request)
>> 
>>>    info: Forwarding cib_modify operation for section status to all
>>> (origin=local/crmd/722)
>>> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_perform_op)
>>> info: Diff: ‑‑‑ 0.54.443 2
>>> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_perform_op)
>>> info: Diff: +++ 0.54.444 (null)
>>> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_perform_op)
>>> info: +  /cib:  @num_updates=444
>>>
>>>
>>> Thanks for help
>>>
>>>
>>>
>>> Classification : Internal
>>> This message and any attachments (the "message") is
>>> intended solely for the intended addressees and is confidential.
>>> If you receive this message in error,or are not the intended
recipient(s),
>>> please delete it and any copies from your systems and immediately notify
>>> the sender. Any unauthorized view, use that does not comply with its
>>> purpose,
>>> dissemination or disclosure, either whole or partial, is prohibited.
Since
>>> the internet
>>> cannot guarantee the integrity of this message which may not be reliable,
>>> BNP PARIBAS
>>> (and its subsidiaries) shall not be liable for the message if modified,
>>> changed or falsified.
>>> Do not print this message unless it is necessary, consider the
> environment.
>>>
>>>
>>
>
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
>> ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
>>>
>>> Ce message et toutes les pieces jointes (ci‑apres le "message")
>>> sont etablis a l'intention exclusive de ses destinataires et sont
>>> confidentiels.
>>> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
>>> merci de le detruire ainsi que toute copie de votre systeme et d'en
> avertir
>>> immediatement l'expediteur. Toute lecture non autorisee, toute
utilisation
>>> de
>>> ce message qui n'est pas conforme a sa destination, toute diffusion ou
> toute
>> 
>>>
>>> publication, totale ou partielle, est interdite. L'Internet ne permettant
>>> pas d'assurer
>>> l'integrite de ce message electronique susceptible d'alteration, BNP
> Paribas
>> 
>>>
>>> (et ses filiales) decline(nt) toute responsabilite au titre de ce message
>>> dans l'hypothese
>>> ou il aurait ete modifie, deforme ou falsifie.
>>> N'imprimez ce message que si necessaire, pensez a l'environnement.
>> 
>> 
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>> 
>> 
>> Classification : Internal
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list