[ClusterLabs] Antw: [EXT] VIP monitor Timed Out

PASERO Florent florent.pasero at externe.bnpparibas.com
Tue Jul 20 05:51:12 EDT 2021


Hi,

Once or twice a week, we have a 'Timed out' on our VIP.

The last :
Cluster Summary:
  * Stack: corosync
  * Current DC: server07 (version 2.0.5-9.el8_4.1-ba59be7122) - partition with quorum
  * Last updated: Tue Jul 20 11:39:22 2021
  * Last change:  Mon Jul  5 09:42:14 2021 by hacluster via cibadmin on server06
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ server06 server07 ]

Active Resources:
  * Resource Group: zbx_prod_Web_Core:
    * VIP       (ocf::heartbeat:IPaddr2):        Started server07
    * ZabbixServer      (systemd:zabbix-server):         Started server07

Failed Resource Actions:
  * VIP_monitor_10000 on server07 'error' (1): call=123, status='Timed Out', exitreason='', last-rc-change='2021-07-19 15:02:27 +02:00', queued=0ms, exec=0ms

Any idea ? because nothing very revealing in the following log files.

Here are the monitoring files just before and just after the time out.

VIP.monitor.2021-07-19.15:01:27 :
+++ 15:01:27: ocf_start_trace:999: echo
+++ 15:01:27: ocf_start_trace:999: printenv
+++ 15:01:27: ocf_start_trace:999: sort
++ 15:01:27: ocf_start_trace:999: env='
HA_LOGFACILITY=daemon
HA_LOGFILE=/var/log/pacemaker/pacemaker.log
HA_cluster_type=corosync
HA_debug=0
HA_logfacility=daemon
HA_logfile=/var/log/pacemaker/pacemaker.log
HA_mcp=true
HA_quorum_type=corosync
INVOCATION_ID=5cd03e610fbf4a9bb3ffe2b30e1fb5d4
JOURNAL_STREAM=9:4433035
LC_ALL=C
OCF_EXIT_REASON_PREFIX=ocf-exit-reason:
OCF_RA_VERSION_MAJOR=1
OCF_RA_VERSION_MINOR=0
OCF_RESKEY_CRM_meta_interval=10000
OCF_RESKEY_CRM_meta_name=monitor
OCF_RESKEY_CRM_meta_on_node=server07
OCF_RESKEY_CRM_meta_on_node_uuid=2
OCF_RESKEY_CRM_meta_timeout=20000
OCF_RESKEY_crm_feature_set=3.7.1
OCF_RESKEY_ip=10.0.0.67
OCF_RESKEY_monitor_retries=10
OCF_RESKEY_trace_file=/apps/Zabbix_Log/Core
OCF_RESKEY_trace_ra=1
OCF_RESOURCE_INSTANCE=VIP
OCF_RESOURCE_PROVIDER=heartbeat
OCF_RESOURCE_TYPE=IPaddr2
OCF_ROOT=/usr/lib/ocf
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb
PCMK_cluster_type=corosync
PCMK_debug=0
PCMK_logfacility=daemon
PCMK_logfile=/var/log/pacemaker/pacemaker.log
PCMK_mcp=true
PCMK_quorum_type=corosync
PCMK_service=pacemaker-execd
PCMK_watchdog=false
PWD=/var/lib/pacemaker/cores
SHLVL=1
VALGRIND_OPTS=--leak-check=full --trace-children=no --vgdb=no --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions --gen-suppressions=all
_=/usr/bin/printenv
__OCF_TRC_DEST=/var/lib/heartbeat/trace_ra/IPaddr2/VIP.monitor.2021-07-19.15:01:27
__OCF_TRC_MANAGE=1'
++ 15:01:27: source:1053: ocf_is_true ''
++ 15:01:27: ocf_is_true:103: case "$1" in
++ 15:01:27: ocf_is_true:103: case "$1" in
++ 15:01:27: ocf_is_true:105: false
+ 15:01:27: main:69: . /usr/lib/ocf/lib/heartbeat/findif.sh
+ 15:01:27: main:72: OCF_RESKEY_lvs_support_default=false
+ 15:01:27: main:73: OCF_RESKEY_lvs_ipv6_addrlabel_default=false
+ 15:01:27: main:74: OCF_RESKEY_lvs_ipv6_addrlabel_value_default=99
+ 15:01:27: main:75: OCF_RESKEY_clusterip_hash_default=sourceip-sourceport
+ 15:01:27: main:76: OCF_RESKEY_unique_clone_address_default=false
+ 15:01:27: main:77: OCF_RESKEY_arp_interval_default=200
+ 15:01:27: main:78: OCF_RESKEY_arp_count_default=5
+ 15:01:27: main:79: OCF_RESKEY_arp_count_refresh_default=0
+ 15:01:27: main:80: OCF_RESKEY_arp_bg_default=true
+ 15:01:27: main:81: OCF_RESKEY_run_arping_default=false
+ 15:01:27: main:82: OCF_RESKEY_noprefixroute_default=false
+ 15:01:27: main:83: OCF_RESKEY_preferred_lft_default=forever
+ 15:01:27: main:84: OCF_RESKEY_monitor_retries=1
+ 15:01:27: main:86: : false
+ 15:01:27: main:87: : false
+ 15:01:27: main:88: : 99
+ 15:01:27: main:89: : sourceip-sourceport
+ 15:01:27: main:90: : false
+ 15:01:27: main:91: : 200
+ 15:01:27: main:92: : 5
+ 15:01:27: main:93: : 0
+ 15:01:27: main:94: : true
+ 15:01:27: main:95: : false
+ 15:01:27: main:96: : false
+ 15:01:27: main:97: : forever
+ 15:01:27: main:98: : 1
+ 15:01:27: main:101: SENDARP=/usr/libexec/heartbeat/send_arp
+ 15:01:27: main:102: SENDUA=/usr/libexec/heartbeat/send_ua
+ 15:01:27: main:103: FINDIF=findif
+ 15:01:27: main:104: VLDIR=/run/resource-agents
+ 15:01:27: main:105: SENDARPPIDDIR=/run/resource-agents
+ 15:01:27: main:106: CIP_lockfile=/run/resource-agents/IPaddr2-CIP-10.0.0.67
+ 15:01:27: main:108: IPADDR2_CIP_IPTABLES=iptables
+ 15:01:27: main:1261: ocf_is_true false
+ 15:01:27: ocf_is_true:103: case "$1" in
+ 15:01:27: ocf_is_true:105: false
+ 15:01:27: main:1268: case $__OCF_ACTION in
+ 15:01:27: main:1276: ip_validate
+ 15:01:27: ip_validate:1184: check_binary ip
+ 15:01:27: check_binary:57: have_binary ip
+ 15:01:27: have_binary:69: '[' '' = 1 ']'
++ 15:01:27: have_binary:72: echo ip
++ 15:01:27: have_binary:72: sed -e 's/ -.*//'
+ 15:01:27: have_binary:72: local bin=ip
++ 15:01:27: have_binary:73: which ip
+ 15:01:27: have_binary:73: test -x /usr/sbin/ip
+ 15:01:27: ip_validate:1185: IP_CIP=
+ 15:01:27: ip_validate:1187: ip_init
+ 15:01:27: ip_init:423: local rc
++ 15:01:27: ip_init:425: uname -s
+ 15:01:27: ip_init:425: '[' XLinux '!=' XLinux ']'
+ 15:01:27: ip_init:430: '[' X10.0.0.67 = X ']'
+ 15:01:27: ip_init:436: case $__OCF_ACTION in
+ 15:01:27: ip_init:438: true
+ 15:01:27: ip_init:441: : 'YAY!'
+ 15:01:27: ip_init:447: BASEIP=10.0.0.67
+ 15:01:27: ip_init:448: BRDCAST=
+ 15:01:27: ip_init:449: NIC=
+ 15:01:27: ip_init:453: '[' '!' -z '' -a -z '' ']'
+ 15:01:27: ip_init:458: NETMASK=
+ 15:01:27: ip_init:459: IFLABEL=
+ 15:01:27: ip_init:460: IF_MAC=
+ 15:01:27: ip_init:462: IP_INC_GLOBAL=1
++ 15:01:27: ip_init:463: expr 0 + 1
+ 15:01:27: ip_init:463: IP_INC_NO=1
+ 15:01:27: ip_init:465: ocf_is_true false
+ 15:01:27: ocf_is_true:103: case "$1" in
+ 15:01:27: ocf_is_true:105: false
+ 15:01:27: ip_init:470: ocf_is_decimal 1
+ 15:01:27: ocf_is_decimal:94: case "$1" in
+ 15:01:27: ocf_is_decimal:98: true
+ 15:01:27: ip_init:470: '[' 1 -gt 0 ']'
+ 15:01:27: ip_init:471: :
+ 15:01:27: ip_init:477: echo 10.0.0.67
+ 15:01:27: ip_init:477: grep -qs :
+ 15:01:27: ip_init:478: '[' 1 -ne 0 ']'
+ 15:01:27: ip_init:479: FAMILY=inet
+ 15:01:27: ip_init:480: ocf_is_true false
+ 15:01:27: ocf_is_true:103: case "$1" in
+ 15:01:27: ocf_is_true:105: false
+ 15:01:27: ip_init:507: case $NIC in
+ 15:01:27: ip_init:507: case $NIC in
++ 15:01:27: ip_init:518: findif
++ 15:01:27: findif:197: local match=10.0.0.67
++ 15:01:27: findif:198: local family
++ 15:01:27: findif:199: local scope
++ 15:01:27: findif:200: local nic=
++ 15:01:27: findif:201: local netmask=
++ 15:01:27: findif:202: local brdcast=
++ 15:01:27: findif:204: echo 10.0.0.67
++ 15:01:27: findif:204: grep -qs :
++ 15:01:27: findif:205: '[' 1 = 0 ']'
++ 15:01:27: findif:208: family=inet
++ 15:01:27: findif:209: scope='scope link'
++ 15:01:27: findif:211: findif_check_params inet
++ 15:01:27: findif_check_params:123: local family=inet
++ 15:01:27: findif_check_params:124: local match=10.0.0.67
++ 15:01:27: findif_check_params:125: local nic=
++ 15:01:27: findif_check_params:127: netmask=
++ 15:01:27: findif_check_params:128: local brdcast=
++ 15:01:27: findif_check_params:129: local errmsg
++ 15:01:27: findif_check_params:131: maybe_convert_dotted_quad_to_cidr
++ 15:01:27: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
++ 15:01:27: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
++ 15:01:27: maybe_convert_dotted_quad_to_cidr:68: return
++ 15:01:27: findif_check_params:135: case $__OCF_ACTION in
++ 15:01:27: findif_check_params:135: case $__OCF_ACTION in
++ 15:01:27: findif_check_params:137: return 0
++ 15:01:27: findif:213: '[' -n '' ']'
++ 15:01:27: findif:216: '[' -n '' ']'
+++ 15:01:27: findif:220: ip -o -f inet route list match 10.0.0.67 scope link
+++ 15:01:27: findif:220: awk 'BEGIN{best=0} /\// { mask=$1; sub(".*/", "", mask); if( int(mask)>=best ) { best=int(mask); best_ln=$0; } } END{print best_ln}'
++ 15:01:27: findif:220: set -- 10.0.0.0/24 dev team0 proto kernel src 10.0.0.66 metric 350
++ 15:01:27: findif:222: '[' 9 = 0 ']'
++ 15:01:27: findif:229: '[' -z '' -o -z '' ']'
++ 15:01:27: findif:230: '[' 9 = 0 ']'
++ 15:01:27: findif:234: case $1 in
++ 15:01:27: findif:234: case $1 in
++ 15:01:27: findif:235: : OK
++ 15:01:27: findif:243: '[' -z '' ']'
++ 15:01:27: findif:243: nic=team0
++ 15:01:27: findif:244: '[' -z '' ']'
++ 15:01:27: findif:244: netmask=24
++ 15:01:27: findif:245: '[' inet = inet ']'
++ 15:01:27: findif:246: '[' -z '' ']'
++ 15:01:27: findif:247: '[' -n 10.0.0.66 ']'
+++ 15:01:27: findif:248: ip -o -f inet addr show
+++ 15:01:27: findif:248: grep 10.0.0.66
++ 15:01:27: findif:248: set -- 5: team0 inet 10.0.0.66/24 brd 10.0.0.255 scope global noprefixroute 'team0\' valid_lft forever preferred_lft forever
++ 15:01:27: findif:249: '[' brd = brd ']'
++ 15:01:27: findif:249: brdcast=10.0.0.255
++ 15:01:27: findif:258: echo 'team0 netmask 24 broadcast 10.0.0.255'
++ 15:01:27: findif:259: return 0
+ 15:01:27: ip_init:507: case $NIC in
+ 15:01:27: ip_init:518: NICINFO='team0 netmask 24 broadcast 10.0.0.255'
+ 15:01:27: ip_init:519: rc=0
+ 15:01:27: ip_init:521: '[' 0 -eq 0 ']'
++ 15:01:27: ip_init:523: echo 'team0 netmask 24 broadcast 10.0.0.255'
++ 15:01:27: ip_init:523: sed -e 's/netmask\ //;s/broadcast\ //'
+ 15:01:27: ip_init:523: NICINFO='team0 24 10.0.0.255'
++ 15:01:27: ip_init:524: echo 'team0 24 10.0.0.255'
++ 15:01:27: ip_init:524: cut '-d ' -f1
+ 15:01:27: ip_init:524: NIC=team0
++ 15:01:27: ip_init:525: echo 'team0 24 10.0.0.255'
++ 15:01:27: ip_init:525: cut '-d ' -f2
+ 15:01:27: ip_init:525: NETMASK=24
++ 15:01:27: ip_init:526: echo 'team0 24 10.0.0.255'
++ 15:01:27: ip_init:526: cut '-d ' -f3
+ 15:01:27: ip_init:526: BRDCAST=10.0.0.255
+ 15:01:27: ip_init:541: SENDARPPIDFILE=/run/resource-agents/send_arp-10.0.0.67
+ 15:01:27: ip_init:543: '[' -n '' ']'
+ 15:01:27: ip_init:551: '[' 1 -gt 1 ']'
+ 15:01:27: ip_validate:1189: set_send_arp_program
+ 15:01:27: set_send_arp_program:1149: ARP_SENDER=send_arp
+ 15:01:27: set_send_arp_program:1150: '[' -n '' ']'
+ 15:01:27: set_send_arp_program:1171: is_infiniband
+ 15:01:27: is_infiniband:767: grep link/infiniband
+ 15:01:27: is_infiniband:767: ip link show team0
+ 15:01:27: ip_validate:1191: '[' -n '' ']'
+ 15:01:27: ip_validate:1202: ocf_is_true false
+ 15:01:27: ocf_is_true:103: case "$1" in
+ 15:01:27: ocf_is_true:105: false
+ 15:01:27: ip_validate:1208: ocf_is_decimal 200
+ 15:01:27: ocf_is_decimal:94: case "$1" in
+ 15:01:27: ocf_is_decimal:98: true
+ 15:01:27: ip_validate:1208: '[' 200 -gt 0 ']'
+ 15:01:27: ip_validate:1209: :
+ 15:01:27: ip_validate:1215: ocf_is_decimal 5
+ 15:01:27: ocf_is_decimal:94: case "$1" in
+ 15:01:27: ocf_is_decimal:98: true
+ 15:01:28: ip_validate:1215: '[' 5 -gt 0 ']'
+ 15:01:28: ip_validate:1216: :
+ 15:01:28: ip_validate:1222: '[' -z forever ']'
+ 15:01:28: ip_validate:1227: '[' -n '' ']'
+ 15:01:28: main:1278: case $__OCF_ACTION in
+ 15:01:28: main:1292: ip_monitor
++ 15:01:28: ip_monitor:1131: ip_served
++ 15:01:28: ip_served:925: '[' -z team0 ']'
+++ 15:01:28: ip_served:930: find_interface 10.0.0.67 24
+++ 15:01:28: find_interface:579: local ipaddr=10.0.0.67
+++ 15:01:28: find_interface:580: local netmask=24
+++ 15:01:28: find_interface:581: local iface=
++++ 15:01:28: find_interface:586: seq 1 1
+++ 15:01:28: find_interface:586: for i in $(seq 1 $OCF_RESKEY_monitor_retries)
++++ 15:01:28: find_interface:590: ip -o -f inet addr show
++++ 15:01:28: find_interface:590: cut -d ' ' -f2
++++ 15:01:28: find_interface:590: grep '\ 10.0.0.67/24'
++++ 15:01:28: find_interface:590: grep -v '^ipsec[0-9][0-9]*$'
+++ 15:01:28: find_interface:590: iface=team0
+++ 15:01:28: find_interface:592: '[' -n team0 ']'
+++ 15:01:28: find_interface:593: break
+++ 15:01:28: find_interface:601: echo team0
+++ 15:01:28: find_interface:602: return 0
++ 15:01:28: ip_served:930: cur_nic=team0
++ 15:01:28: ip_served:932: '[' -z team0 ']'
++ 15:01:28: ip_served:937: '[' -z '' ']'
++ 15:01:28: ip_served:938: for i in $cur_nic
++ 15:01:28: ip_served:940: '[' team0 = team0 ']'
++ 15:01:28: ip_served:941: echo ok
++ 15:01:28: ip_served:942: return 0
+ 15:01:28: ip_monitor:1131: local ip_status=ok
+ 15:01:28: ip_monitor:1132: case $ip_status in
+ 15:01:28: ip_monitor:1134: run_arp_sender refresh
+ 15:01:28: run_arp_sender:844: '[' xrefresh = xrefresh ']'
+ 15:01:28: run_arp_sender:845: ARP_COUNT=0
+ 15:01:28: run_arp_sender:846: LOGLEVEL=debug
+ 15:01:28: run_arp_sender:851: '[' 0 -eq 0 ']'
+ 15:01:28: run_arp_sender:852: return
+ 15:01:28: ip_monitor:1135: return 0

VIP.monitor.2021-07-19.15:03:14 :
+++ 15:03:14: ocf_start_trace:999: echo
+++ 15:03:14: ocf_start_trace:999: printenv
+++ 15:03:14: ocf_start_trace:999: sort
++ 15:03:14: ocf_start_trace:999: env='
HA_LOGFACILITY=daemon
HA_LOGFILE=/var/log/pacemaker/pacemaker.log
HA_cluster_type=corosync
HA_debug=0
HA_logfacility=daemon
HA_logfile=/var/log/pacemaker/pacemaker.log
HA_mcp=true
HA_quorum_type=corosync
INVOCATION_ID=5cd03e610fbf4a9bb3ffe2b30e1fb5d4
JOURNAL_STREAM=9:4433035
LC_ALL=C
OCF_EXIT_REASON_PREFIX=ocf-exit-reason:
OCF_RA_VERSION_MAJOR=1
OCF_RA_VERSION_MINOR=0
OCF_RESKEY_CRM_meta_interval=10000
OCF_RESKEY_CRM_meta_name=monitor
OCF_RESKEY_CRM_meta_on_node=server07
OCF_RESKEY_CRM_meta_on_node_uuid=2
OCF_RESKEY_CRM_meta_timeout=20000
OCF_RESKEY_crm_feature_set=3.7.1
OCF_RESKEY_ip=10.0.0.67
OCF_RESKEY_monitor_retries=10
OCF_RESKEY_trace_file=/apps/Zabbix_Log/Core
OCF_RESKEY_trace_ra=1
OCF_RESOURCE_INSTANCE=VIP
OCF_RESOURCE_PROVIDER=heartbeat
OCF_RESOURCE_TYPE=IPaddr2
OCF_ROOT=/usr/lib/ocf
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb
PCMK_cluster_type=corosync
PCMK_debug=0
PCMK_logfacility=daemon
PCMK_logfile=/var/log/pacemaker/pacemaker.log
PCMK_mcp=true
PCMK_quorum_type=corosync
PCMK_service=pacemaker-execd
PCMK_watchdog=false
PWD=/var/lib/pacemaker/cores
SHLVL=1
VALGRIND_OPTS=--leak-check=full --trace-children=no --vgdb=no --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions --gen-suppressions=all
_=/usr/bin/printenv
__OCF_TRC_DEST=/var/lib/heartbeat/trace_ra/IPaddr2/VIP.monitor.2021-07-19.15:03:14
__OCF_TRC_MANAGE=1'
++ 15:03:14: source:1053: ocf_is_true ''
++ 15:03:14: ocf_is_true:103: case "$1" in
++ 15:03:14: ocf_is_true:103: case "$1" in
++ 15:03:14: ocf_is_true:105: false
+ 15:03:14: main:69: . /usr/lib/ocf/lib/heartbeat/findif.sh
+ 15:03:14: main:72: OCF_RESKEY_lvs_support_default=false
+ 15:03:14: main:73: OCF_RESKEY_lvs_ipv6_addrlabel_default=false
+ 15:03:14: main:74: OCF_RESKEY_lvs_ipv6_addrlabel_value_default=99
+ 15:03:14: main:75: OCF_RESKEY_clusterip_hash_default=sourceip-sourceport
+ 15:03:14: main:76: OCF_RESKEY_unique_clone_address_default=false
+ 15:03:14: main:77: OCF_RESKEY_arp_interval_default=200
+ 15:03:14: main:78: OCF_RESKEY_arp_count_default=5
+ 15:03:14: main:79: OCF_RESKEY_arp_count_refresh_default=0
+ 15:03:14: main:80: OCF_RESKEY_arp_bg_default=true
+ 15:03:14: main:81: OCF_RESKEY_run_arping_default=false
+ 15:03:14: main:82: OCF_RESKEY_noprefixroute_default=false
+ 15:03:14: main:83: OCF_RESKEY_preferred_lft_default=forever
+ 15:03:14: main:84: OCF_RESKEY_monitor_retries=1
+ 15:03:14: main:86: : false
+ 15:03:14: main:87: : false
+ 15:03:14: main:88: : 99
+ 15:03:14: main:89: : sourceip-sourceport
+ 15:03:14: main:90: : false
+ 15:03:14: main:91: : 200
+ 15:03:14: main:92: : 5
+ 15:03:14: main:93: : 0
+ 15:03:14: main:94: : true
+ 15:03:14: main:95: : false
+ 15:03:14: main:96: : false
+ 15:03:14: main:97: : forever
+ 15:03:14: main:98: : 1
+ 15:03:14: main:101: SENDARP=/usr/libexec/heartbeat/send_arp
+ 15:03:14: main:102: SENDUA=/usr/libexec/heartbeat/send_ua
+ 15:03:14: main:103: FINDIF=findif
+ 15:03:14: main:104: VLDIR=/run/resource-agents
+ 15:03:14: main:105: SENDARPPIDDIR=/run/resource-agents
+ 15:03:14: main:106: CIP_lockfile=/run/resource-agents/IPaddr2-CIP-10.0.0.67
+ 15:03:14: main:108: IPADDR2_CIP_IPTABLES=iptables
+ 15:03:14: main:1261: ocf_is_true false
+ 15:03:14: ocf_is_true:103: case "$1" in
+ 15:03:14: ocf_is_true:105: false
+ 15:03:14: main:1268: case $__OCF_ACTION in
+ 15:03:14: main:1276: ip_validate
+ 15:03:14: ip_validate:1184: check_binary ip
+ 15:03:14: check_binary:57: have_binary ip
+ 15:03:14: have_binary:69: '[' '' = 1 ']'
++ 15:03:14: have_binary:72: echo ip
++ 15:03:14: have_binary:72: sed -e 's/ -.*//'
+ 15:03:14: have_binary:72: local bin=ip
++ 15:03:14: have_binary:73: which ip
+ 15:03:14: have_binary:73: test -x /usr/sbin/ip
+ 15:03:14: ip_validate:1185: IP_CIP=
+ 15:03:14: ip_validate:1187: ip_init
+ 15:03:14: ip_init:423: local rc
++ 15:03:14: ip_init:425: uname -s
+ 15:03:14: ip_init:425: '[' XLinux '!=' XLinux ']'
+ 15:03:14: ip_init:430: '[' X10.0.0.67 = X ']'
+ 15:03:14: ip_init:436: case $__OCF_ACTION in
+ 15:03:14: ip_init:438: true
+ 15:03:14: ip_init:441: : 'YAY!'
+ 15:03:14: ip_init:447: BASEIP=10.0.0.67
+ 15:03:14: ip_init:448: BRDCAST=
+ 15:03:14: ip_init:449: NIC=
+ 15:03:14: ip_init:453: '[' '!' -z '' -a -z '' ']'
+ 15:03:14: ip_init:458: NETMASK=
+ 15:03:14: ip_init:459: IFLABEL=
+ 15:03:14: ip_init:460: IF_MAC=
+ 15:03:14: ip_init:462: IP_INC_GLOBAL=1
++ 15:03:14: ip_init:463: expr 0 + 1
+ 15:03:14: ip_init:463: IP_INC_NO=1
+ 15:03:14: ip_init:465: ocf_is_true false
+ 15:03:14: ocf_is_true:103: case "$1" in
+ 15:03:14: ocf_is_true:105: false
+ 15:03:14: ip_init:470: ocf_is_decimal 1
+ 15:03:14: ocf_is_decimal:94: case "$1" in
+ 15:03:14: ocf_is_decimal:98: true
+ 15:03:14: ip_init:470: '[' 1 -gt 0 ']'
+ 15:03:14: ip_init:471: :
+ 15:03:14: ip_init:477: echo 10.0.0.67
+ 15:03:14: ip_init:477: grep -qs :
+ 15:03:14: ip_init:478: '[' 1 -ne 0 ']'
+ 15:03:14: ip_init:479: FAMILY=inet
+ 15:03:14: ip_init:480: ocf_is_true false
+ 15:03:14: ocf_is_true:103: case "$1" in
+ 15:03:14: ocf_is_true:105: false
+ 15:03:14: ip_init:507: case $NIC in
+ 15:03:14: ip_init:507: case $NIC in
++ 15:03:14: ip_init:518: findif
++ 15:03:14: findif:197: local match=10.0.0.67
++ 15:03:14: findif:198: local family
++ 15:03:14: findif:199: local scope
++ 15:03:14: findif:200: local nic=
++ 15:03:14: findif:201: local netmask=
++ 15:03:14: findif:202: local brdcast=
++ 15:03:14: findif:204: echo 10.0.0.67
++ 15:03:14: findif:204: grep -qs :
++ 15:03:14: findif:205: '[' 1 = 0 ']'
++ 15:03:14: findif:208: family=inet
++ 15:03:14: findif:209: scope='scope link'
++ 15:03:14: findif:211: findif_check_params inet
++ 15:03:14: findif_check_params:123: local family=inet
++ 15:03:14: findif_check_params:124: local match=10.0.0.67
++ 15:03:14: findif_check_params:125: local nic=
++ 15:03:14: findif_check_params:127: netmask=
++ 15:03:14: findif_check_params:128: local brdcast=
++ 15:03:14: findif_check_params:129: local errmsg
++ 15:03:14: findif_check_params:131: maybe_convert_dotted_quad_to_cidr
++ 15:03:14: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
++ 15:03:14: maybe_convert_dotted_quad_to_cidr:55: case $netmask in
++ 15:03:14: maybe_convert_dotted_quad_to_cidr:68: return
++ 15:03:14: findif_check_params:135: case $__OCF_ACTION in
++ 15:03:14: findif_check_params:135: case $__OCF_ACTION in
++ 15:03:14: findif_check_params:137: return 0
++ 15:03:14: findif:213: '[' -n '' ']'
++ 15:03:14: findif:216: '[' -n '' ']'
+++ 15:03:14: findif:220: ip -o -f inet route list match 10.0.0.67 scope link
+++ 15:03:14: findif:220: awk 'BEGIN{best=0} /\// { mask=$1; sub(".*/", "", mask); if( int(mask)>=best ) { best=int(mask); best_ln=$0; } } END{print best_ln}'
++ 15:03:14: findif:220: set -- 10.0.0.0/24 dev team0 proto kernel src 10.0.0.66 metric 350
++ 15:03:14: findif:222: '[' 9 = 0 ']'
++ 15:03:14: findif:229: '[' -z '' -o -z '' ']'
++ 15:03:14: findif:230: '[' 9 = 0 ']'
++ 15:03:14: findif:234: case $1 in
++ 15:03:14: findif:234: case $1 in
++ 15:03:14: findif:235: : OK
++ 15:03:14: findif:243: '[' -z '' ']'
++ 15:03:14: findif:243: nic=team0
++ 15:03:14: findif:244: '[' -z '' ']'
++ 15:03:14: findif:244: netmask=24
++ 15:03:14: findif:245: '[' inet = inet ']'
++ 15:03:14: findif:246: '[' -z '' ']'
++ 15:03:14: findif:247: '[' -n 10.0.0.66 ']'
+++ 15:03:14: findif:248: ip -o -f inet addr show
+++ 15:03:14: findif:248: grep 10.0.0.66
++ 15:03:14: findif:248: set -- 5: team0 inet 10.0.0.66/24 brd 10.0.0.255 scope global noprefixroute 'team0\' valid_lft forever preferred_lft forever
++ 15:03:14: findif:249: '[' brd = brd ']'
++ 15:03:14: findif:249: brdcast=10.0.0.255
++ 15:03:14: findif:258: echo 'team0 netmask 24 broadcast 10.0.0.255'
++ 15:03:14: findif:259: return 0
+ 15:03:14: ip_init:507: case $NIC in
+ 15:03:14: ip_init:518: NICINFO='team0 netmask 24 broadcast 10.0.0.255'
+ 15:03:14: ip_init:519: rc=0
+ 15:03:14: ip_init:521: '[' 0 -eq 0 ']'
++ 15:03:14: ip_init:523: echo 'team0 netmask 24 broadcast 10.0.0.255'
++ 15:03:14: ip_init:523: sed -e 's/netmask\ //;s/broadcast\ //'
+ 15:03:14: ip_init:523: NICINFO='team0 24 10.0.0.255'
++ 15:03:14: ip_init:524: echo 'team0 24 10.0.0.255'
++ 15:03:14: ip_init:524: cut '-d ' -f1
+ 15:03:14: ip_init:524: NIC=team0
++ 15:03:14: ip_init:525: echo 'team0 24 10.0.0.255'
++ 15:03:14: ip_init:525: cut '-d ' -f2
+ 15:03:14: ip_init:525: NETMASK=24
++ 15:03:14: ip_init:526: echo 'team0 24 10.0.0.255'
++ 15:03:14: ip_init:526: cut '-d ' -f3
+ 15:03:14: ip_init:526: BRDCAST=10.0.0.255
+ 15:03:14: ip_init:541: SENDARPPIDFILE=/run/resource-agents/send_arp-10.0.0.67
+ 15:03:14: ip_init:543: '[' -n '' ']'
+ 15:03:14: ip_init:551: '[' 1 -gt 1 ']'
+ 15:03:14: ip_validate:1189: set_send_arp_program
+ 15:03:14: set_send_arp_program:1149: ARP_SENDER=send_arp
+ 15:03:14: set_send_arp_program:1150: '[' -n '' ']'
+ 15:03:14: set_send_arp_program:1171: is_infiniband
+ 15:03:14: is_infiniband:767: ip link show team0
+ 15:03:14: is_infiniband:767: grep link/infiniband
+ 15:03:14: ip_validate:1191: '[' -n '' ']'
+ 15:03:14: ip_validate:1202: ocf_is_true false
+ 15:03:14: ocf_is_true:103: case "$1" in
+ 15:03:14: ocf_is_true:105: false
+ 15:03:14: ip_validate:1208: ocf_is_decimal 200
+ 15:03:14: ocf_is_decimal:94: case "$1" in
+ 15:03:14: ocf_is_decimal:98: true
+ 15:03:14: ip_validate:1208: '[' 200 -gt 0 ']'
+ 15:03:14: ip_validate:1209: :
+ 15:03:14: ip_validate:1215: ocf_is_decimal 5
+ 15:03:14: ocf_is_decimal:94: case "$1" in
+ 15:03:14: ocf_is_decimal:98: true
+ 15:03:14: ip_validate:1215: '[' 5 -gt 0 ']'
+ 15:03:14: ip_validate:1216: :
+ 15:03:14: ip_validate:1222: '[' -z forever ']'
+ 15:03:14: ip_validate:1227: '[' -n '' ']'
+ 15:03:14: main:1278: case $__OCF_ACTION in
+ 15:03:14: main:1292: ip_monitor
++ 15:03:14: ip_monitor:1131: ip_served
++ 15:03:14: ip_served:925: '[' -z team0 ']'
+++ 15:03:14: ip_served:930: find_interface 10.0.0.67 24
+++ 15:03:14: find_interface:579: local ipaddr=10.0.0.67
+++ 15:03:14: find_interface:580: local netmask=24
+++ 15:03:14: find_interface:581: local iface=
++++ 15:03:14: find_interface:586: seq 1 1
+++ 15:03:14: find_interface:586: for i in $(seq 1 $OCF_RESKEY_monitor_retries)
++++ 15:03:14: find_interface:590: ip -o -f inet addr show
++++ 15:03:14: find_interface:590: grep '\ 10.0.0.67/24'
++++ 15:03:14: find_interface:590: cut -d ' ' -f2
++++ 15:03:14: find_interface:590: grep -v '^ipsec[0-9][0-9]*$'
+++ 15:03:14: find_interface:590: iface=team0
+++ 15:03:14: find_interface:592: '[' -n team0 ']'
+++ 15:03:14: find_interface:593: break
+++ 15:03:14: find_interface:601: echo team0
+++ 15:03:14: find_interface:602: return 0
++ 15:03:14: ip_served:930: cur_nic=team0
++ 15:03:14: ip_served:932: '[' -z team0 ']'
++ 15:03:14: ip_served:937: '[' -z '' ']'
++ 15:03:14: ip_served:938: for i in $cur_nic
++ 15:03:14: ip_served:940: '[' team0 = team0 ']'
++ 15:03:14: ip_served:941: echo ok
++ 15:03:14: ip_served:942: return 0
+ 15:03:14: ip_monitor:1131: local ip_status=ok
+ 15:03:14: ip_monitor:1132: case $ip_status in
+ 15:03:14: ip_monitor:1134: run_arp_sender refresh
+ 15:03:14: run_arp_sender:844: '[' xrefresh = xrefresh ']'
+ 15:03:14: run_arp_sender:845: ARP_COUNT=0
+ 15:03:14: run_arp_sender:846: LOGLEVEL=debug
+ 15:03:14: run_arp_sender:851: '[' 0 -eq 0 ']'
+ 15:03:14: run_arp_sender:852: return
+ 15:03:14: ip_monitor:1135: return 0

Best regards,

Florent

De : Users <users-bounces at clusterlabs.org<mailto:users-bounces at clusterlabs.org>> De la part de Klaus Wenninger
Envoyé : lundi 5 juillet 2021 09:14
À : Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>
Objet : Re: [ClusterLabs] Antw: [EXT] VIP monitor Timed Out

Using DHCP? Maybe a glitch/issue during renewal ... but elaborate monitoring as suggested should show that ...

On Mon, Jul 5, 2021 at 9:03 AM Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de<mailto:Ulrich.Windl at rz.uni-regensburg.de>> wrote:
Hi!

See "ip_served" and "find_interface" (essentially "$IP2UTIL -o -f $FAMILY addr
show") in the RA.
Basically it searches _all_ interfaces for $ipaddr/$netmask to locate the
interface when it could also examine the interface and look at the address.
For many interfaces it could make a difference performance-wise IMHO.
Maybe so a periodic sampling how long the corresponding command takes for your
setup.
If it's not a timing issue, the interface may actually be gone temporarily, or
the tools could have bugs.

Regards,
Ulrich

>>> PASERO Florent <florent.pasero at externe.bnpparibas.com<mailto:florent.pasero at externe.bnpparibas.com>> schrieb am
01.07.2021 um
17:29 in Nachricht
<PR0P264MB21394030D5C5120BB885E95DB4009 at PR0P264MB2139.FRAP264.PROD.OUTLOOK.COM<mailto:PR0P264MB21394030D5C5120BB885E95DB4009 at PR0P264MB2139.FRAP264.PROD.OUTLOOK.COM>>:

> Hi,
>
> Once or twice a week, we have a 'Timed out' on our VIP:
> ~$ pcs status
> Cluster name: zbx_pprod_Web_Core
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: #####(version 2.0.5‑9.el8_4.1‑ba59be7122) ‑ partition with
> quorum
>   * Last updated: Mon Jun 28 16:32:09 2021
>   * Last change:  Mon Jun 14 12:42:57 2021 by root via cibadmin on ######
>   * 2 nodes configured
>   * 2 resource instances configured
>
> Node List:
>   * Online: [ ##### #####]
>
> Full List of Resources:
>   * Resource Group: zbx_pprod_Web_Core:
>     * VIP       (ocf::heartbeat:IPaddr2):        Started #####
>     * ZabbixServer      (systemd:zabbix‑server):         Started ######
>
> Failed Resource Actions:
>   * VIP_monitor_5000 on ##### 'error' (1): call=69, status='Timed Out',
> exitreason='', last‑rc‑change='2021‑06‑24 14:41:57 +02:00', queued=0ms,
exec=0ms
>   * VIP_monitor_5000 on ##### 'error' (1): call=11, status='Timed Out',
> exitreason='', last‑rc‑change='2021‑06‑17 14:18:20 +02:00', queued=0ms,
exec=0ms
>
>
> We have the same issue on two completely different clusters.
>
> We can see in the log :
> Jun 24 14:41:29 ##### pacemaker‑execd     [1442069] (child_timeout_callback)

>    warning: VIP_monitor_5000 process (PID 2752333) timed out
> Jun 24 14:41:34 #####pacemaker‑execd     [1442069] (child_timeout_callback)

>   crit: VIP_monitor_5000 process (PID 2752333) will not die!
> Jun 24 14:41:57 ##### pacemaker‑execd     [1442069] (operation_finished)

>    warning: VIP_monitor_5000[2752333] timed out after 20000ms
> Jun 24 14:41:57 ##### pacemaker‑controld  [1442072] (process_lrm_event)
> error: Result of monitor operation for VIP on #####: Timed Out | call=69
> key=VIP_monitor_5000 timeout=20000ms
> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_process_request)

>    info: Forwarding cib_modify operation for section status to all
> (origin=local/crmd/722)
> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_perform_op)
> info: Diff: ‑‑‑ 0.54.443 2
> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_perform_op)
> info: Diff: +++ 0.54.444 (null)
> Jun 24 14:41:57 ##### pacemaker‑based     [1442067] (cib_perform_op)
> info: +  /cib:  @num_updates=444
>
>
> Thanks for help
>
>
>
> Classification : Internal
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential.
> If you receive this message in error,or are not the intended recipient(s),
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its
> purpose,
> dissemination or disclosure, either whole or partial, is prohibited. Since
> the internet
> cannot guarantee the integrity of this message which may not be reliable,
> BNP PARIBAS
> (and its subsidiaries) shall not be liable for the message if modified,
> changed or falsified.
> Do not print this message unless it is necessary, consider the environment.
>
>
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
>
> Ce message et toutes les pieces jointes (ci‑apres le "message")
> sont etablis a l'intention exclusive de ses destinataires et sont
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation
> de
> ce message qui n'est pas conforme a sa destination, toute diffusion ou toute

>
> publication, totale ou partielle, est interdite. L'Internet ne permettant
> pas d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP Paribas

>
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message
> dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie.
> N'imprimez ce message que si necessaire, pensez a l'environnement.



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Classification : Internal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210720/1d4b3206/attachment-0001.htm>


More information about the Users mailing list