From oalbrigt at redhat.com Wed Apr 8 06:25:35 2026 From: oalbrigt at redhat.com (Oyvind Albrigtsen) Date: Wed, 8 Apr 2026 08:25:35 +0200 Subject: [ClusterLabs] resource-agents v4.18.0 Message-ID: ClusterLabs is happy to announce resource-agents v4.18.0. Source code is available at: https://github.com/ClusterLabs/resource-agents/releases/tag/v4.18.0 The most significant enhancements in this release are: - bugfixes and enhancements: - ocf-shellfuncs: add ocf_log_pipe - ocft: fix failing tests in resource-agents v4.17.0 - Filesystem: improve shell trace (set -x) output - Filesystem: modify the return code when a mismatch between mount state and configuration is detected in monitor operation - Filesystem: new force_unmount=move option - Filesystem: optionally report "xargs ps -f" even when killing many processes - Filesystem: signal many processes in parallel - Filesystem: dont create systemd drop-in file for tmpfs/overlayfs - Filesystem: try umount immediately after signals are sent - IPaddr2: made find_interface() output empty when no interface is found - IPsrcaddr: fix grep expression, so it doesnt log "stray \ before white space" with newer versions of grep - aws-vpc-move-ip: add awscli_timeout parameter - db2: set reintegration flag when promotion is successful - docker: improve image existence check (#2121) - exportfs: fix grep error on stop - findif.c: remove unused colonptr variable - pgsql: use monitor_user for monitor-calls and use .pgpass when monitor_password is not specified - podman-etcd: add -a option to crictl ps (#2112) - podman-etcd: enhance etcd data backup with snapshots and retention - podman-etcd: fix "Peer URLs already exists" in add_member_as_learner (#2136) - podman-etcd: fix learner node attribute not set after etcdctl failure - podman-etcd: fix to prevent learner from starting before cluster is ready (#2098) - podman-etcd: hardened monitor/stop actions - podman-etcd: improve error handling to support retry on start errors (#2105) - podman-etcd: preserve standalone voter identity during restart - podman-etcd: prevent last active member from leaving the etcd member list - podman-etcd: remove test code (#2103) - podman-etcd: removed unneeded ETCDCTL_API environment variable - podman-etcd: sync environment variables with Pod manifest - portblock: check correct binary during validate-all - portblock: monitor needs to also check state file of inverse action (#2108) - powervs-move-ip/powervs-subnet: fix error logging - powervs-subnet: wait until IP is activated before running monitor-check - send_arp.linux/tickle_tcp: better alpine compatibility (#2119) - sfex_lib: dont discard 'const' qualifier The full list of changes for resource-agents is available at: https://github.com/ClusterLabs/resource-agents/blob/v4.18.0/ChangeLog Everyone is encouraged to download and test the new release. We do many regression tests and simulations, but we can't cover all possible use cases, so your feedback is important and appreciated. Many thanks to all the contributors to this release. Best, The resource-agents maintainers From Dmytro_Poliarush at epam.com Fri Apr 10 09:38:37 2026 From: Dmytro_Poliarush at epam.com (Dmytro Poliarush) Date: Fri, 10 Apr 2026 09:38:37 +0000 Subject: [ClusterLabs] pacemaker: 1.1.23 20sec timeout on cluster with disc I/O write delays In-Reply-To: References: <32c3f4f6dd434f6abde87e4cd984d892@ukr.de> Message-ID: Hi Windl, Just a reminder about pacemaker 1.1.23 suspicious behaviour. Would you please find time to check strace below and maybe forward me to a knowledgeable person. Regards, Dmytro ________________________________ From: Dmytro Poliarush Sent: 25 March 2026 17:19 To: Windl, Ulrich ; Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: pacemaker: 1.1.23 20sec timeout on cluster with disc I/O write delays Hi Windl, Thank yo very much for your prompt reply and links to timeout. I've tried all of those already and they are NOT working. From my observation there is some kind of hardcoded 20sec timeout in stonithd on pacemaker 1.1.23. In this pacemaker version stonithd is compled from: `commands.c`, `internal.h`, `main.c`, `remote.c` And we assume that 20sec timeout is hardcoded somewhere in these sources. Most logical candidate so far was: ``` fencing/commands.c #define DEFAULT_QUERY_TIMEOUT 20 ``` But changing that value to 120 did NOT work. strace still shows stonithd closing socket with stonith_admin after 20sec polling. This is visible in attached: st_admin_strace.9964.comments.log: ``` 05:45:08.680800 socket(AF_UNIX, SOCK_STREAM, 0) = 5 <0.000446> 05:45:08.694409 connect(5, {sa_family=AF_UNIX, sun_path=@"stonith-ng\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 110) = 0 <0.000810> 05:45:08.699454 poll([{fd=5420078]>, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000004> 05:45:08.699719 poll([{fd=5420078]>, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000005> ... more polling on fd=5 here ... 05:45:08.700324 poll([{fd=5420078]>, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000006> 05:45:09.099605 poll([{fd=5420078]>, events=POLLIN}], 1, 0) = 1 ([{fd=5, revents=POLLIN}]) <0.000092> 05:45:29.344300 shutdown(5420078]>, SHUT_RDWR) = 0 <0.000022> 05:45:29.344391 close(5420078]>) = 0 <0.000030> 05:45:29.346138 exit_group(-62) = ? 05:45:29.347107 +++ exited with 194 +++ ``` What is MOST important here that strace top log shows stonith_admin started with `-t 60` (timeout in seconds). 05:45:08.659305 execve("/usr/sbin/stonith_admin", ["stonith_admin", "-VVV", "-t", "60", "-B", "node2"], 0x7ffd2b75e158 /* 33 vars */) = 0 <0.000162> But somehow stonithd ignores that and after 20sec polling: 05:45:29.344300 shutdown(5420078]>, SHUT_RDWR) = 0 <0.000022> Regards, Dmytro ________________________________ From: Windl, Ulrich Sent: 23 March 2026 12:49 To: Cluster Labs - All topics related to open-source clustering welcomed Cc: Dmytro Poliarush Subject: RE: pacemaker: 1.1.23 20sec timeout on cluster with disc I/O write delays You don't often get email from u.windl at ukr.de. Learn why this is important I think you should provide more information, like SBD configuration, syslog messages, etc. Usually node fencing via SBD works by writing a message to a shared disk slot. Once written, SBD/pacemaker expects the node to suicide soon. However multiple timeouts are configurable. Asking AI, I got this (treat with some care): I'll search for the specific timeout parameters and their interdependencies in Linux Pacemaker clusters with SBD fencing. SBD Timeout Parameters in Linux Pacemaker Clusters In a Pacemaker cluster using SBD (STONITH Block Device) for fencing, there are four primary configurable timeout parameters, and they have strict interdependencies. Understanding these relationships is critical for reliable cluster operation. ________________________________ Timeout Parameters and Their Interdependencies Parameter Configuration Location Type Purpose Default SBD_WATCHDOG_TIMEOUT /etc/sysconfig/sbd SBD daemon config Hardware watchdog timeout; triggers node self-fence if no kick received 5 seconds msgwait SBD device metadata SBD device level Time window for message delivery to node slot on SBD device Set during device initialization stonith-timeout Pacemaker CIB (cluster property) Global cluster property Maximum time Pacemaker waits for STONITH action (reboot/off) to complete 60 seconds stonith-watchdog-timeout Pacemaker CIB (cluster property) Global cluster property Time Pacemaker assumes fencing has completed via watchdog (diskless SBD only) 0 (disabled by default) ________________________________ Critical Interdependencies The timeout parameters have strict mathematical relationships that must be maintained for proper cluster behavior: For Disk-Based SBD (with shared storage devices): msgwait >= (watchdog_timeout ? 2)stonith-timeout >= msgwait + 20% Example: If watchdog timeout is 30 seconds: * msgwait must be at least 60 seconds * stonith-timeout must be at least 72 seconds (60 + 20%) For Diskless SBD (watchdog-only, no shared storage): stonith-watchdog-timeout >= (SBD_WATCHDOG_TIMEOUT ? 2)stonith-timeout >= stonith-watchdog-timeout + 20% Example: If SBD_WATCHDOG_TIMEOUT is 5 seconds: * stonith-watchdog-timeout must be at least 10 seconds * stonith-timeout must be at least 12 seconds (10 + 20%) ________________________________ How These Parameters Interact Watchdog Timeout (SBD_WATCHDOG_TIMEOUT) This is the foundation of the timeout hierarchy. It represents how long the hardware watchdog will wait for a "kick" (heartbeat) from the SBD daemon before forcibly resetting the node. If storage latency or system issues prevent the SBD daemon from operating, the node self-fences after this timeout expires. Message Wait Timeout (msgwait) This is set in the SBD device metadata during initialization and defines the grace period for a fencing message to be acknowledged as delivered to the target node's slot. It must be at least twice the watchdog timeout to ensure the node has time to detect the fencing message and self-fence gracefully before the watchdog triggers. STONITH Timeout (stonith-timeout) This is a Pacemaker cluster property that controls how long the cluster waits for the fencing action to complete. It must exceed msgwait by at least 20% to allow sufficient time for the message to be delivered and processed. If this timeout is too short, the cluster may consider the fencing action failed and retry, causing unnecessary delays. STONITH Watchdog Timeout (stonith-watchdog-timeout) This parameter is only used for diskless SBD and tells Pacemaker how long to wait before assuming a node has already self-fenced via the watchdog. It must be at least twice the SBD_WATCHDOG_TIMEOUT to provide a safety margin. Setting this to 0 (the default) disables resource recovery and is appropriate only for disk-based SBD configurations. ________________________________ Critical Warnings Alice, pay attention to these constraints: * Do not set stonith-watchdog-timeout until SBD is configured and running on every node, including Pacemaker Remote nodes. * If stonith-timeout < stonith-watchdog-timeout in diskless SBD, nodes can become stuck in an UNCLEAN state, blocking failover. * For multipath or iSCSI setups, the watchdog timeout should account for path failure detection and failover time. The max_polling_interval in /etc/multipath.conf must be less than the watchdog timeout. * Changing watchdog timeout requires coordinating changes across all dependent timeouts to maintain the mathematical relationships. * Storage latency is the primary driver of watchdog timeout values; high-latency storage requires longer timeouts, which cascades into longer msgwait and stonith-timeout values. The interdependency structure ensures that each timeout layer provides sufficient time for the layer below it to complete, preventing race conditions and cluster deadlock scenarios. Kind regards, Ulrich Windl From: Users On Behalf Of Dmytro Poliarush via Users Sent: Tuesday, March 17, 2026 12:32 PM To: users at clusterlabs.org Cc: Dmytro Poliarush Subject: [EXT] [EXT] [ClusterLabs] pacemaker: 1.1.23 20sec timeout on cluster with disc I/O write delays Sicherheits-Hinweis: Diese E-Mail wurde von einer Person au?erhalb des UKR gesendet. Seien Sie vorsichtig vor gef?lschten Absendern, wenn Sie auf Links klicken, Anh?nge ?ffnen oder weitere Aktionen ausf?hren, bevor Sie die Echtheit ?berpr?ft haben. Hi all, Need a small guidance on pacemaker: 1.1.23. I'm chasing a stubborn issue in a 2node 2disc SBD cluster. When running manual fencing test with `pcs stonith fence` command, I observe an error ``` Error: unable to fence '' ``` Error manifests each time around a `20second` timeout(I assume this is a timeout). `time` command is used to track how long execution runs: `time pcs stonith fence`. Here is an example: ``` [root at node1 ~]# time pcs stonith fence --debug node2 Running: /usr/sbin/stonith_admin -B node2 > Return Value: 194 --Debug Output Start-- --Debug Output End-- Error: unable to fence 'node2' > real????0m20.791s user????0m0.063s sys?????0m0.033s [root at node1 ~]# ``` For investigation, I've setup a testing cluster with 2 Virtualbox VMs. Behaviour was NOT observed on testing cluster until I intentionally added disk write delays with dmsetup tool on one of the nodes. Here is an example of setting a 22sec write delay: ``` # Create: read delay = 0 ms, write delay = 22000 ms # Table format: delay dmsetup --noudevsync create slow-sdc --table "0 ${SIZE} delay /dev/sdc 0 0 /dev/sdc 0 22000" dmsetup mknodes ``` NOTE, that tests with delays upto(including) 19sec pass: ``` [root at node1 ~]# ./suspend-resume-slow-sdc-delay-write.sh 20000 [root at node1 ~]# dmsetup table slow-sdc > 0 262144 delay 8:32 0 0 8:32 0 20000 [root at node1 ~]# time pcs stonith fence --debug node2 Running: /usr/sbin/stonith_admin -B node2 Return Value: 194 --Debug Output Start-- --Debug Output End-- ``` > Error: unable to fence 'node2' > real????0m20.588s user????0m0.088s sys?????0m0.021s > [root at node1 ~]# ./suspend-resume-slow-sdc-delay-write.sh 19000 ++ blockdev --getsize /dev/sdc + SIZE=262144 ++ lsblk -dn -o MAJ:MIN /dev/sdc + MAJMIN=' 8:32 ' + dmsetup suspend slow-sdc + dmsetup reload slow-sdc --table '0 262144 delay /dev/sdc 0 0 /dev/sdc 0 19000' + dmsetup resume slow-sdc + dmsetup table slow-sdc > 0 262144 delay 8:32 0 0 8:32 0 19000 [root at node1 ~]# pcs stonith history cleanup; pcs stonith cleanup # pcs-cleanup-error-cleanup cleaning up fencing-history for node * Cleaned up all resources on all nodes [root at node1 ~]# [root at node1 ~]# time pcs stonith fence --debug node2 Running: /usr/sbin/stonith_admin -B node2 Return Value: 0 --Debug Output Start-- --Debug Output End-- > Node: node2 fenced > real????0m19.869s user????0m0.098s sys?????0m0.035s [root at node1 ~]# ``` So here is my question: I assume there is a 20sec timeout value hardcoded somewhere in pacemaker 1.1.23 sources. This hardcoded value impacts manual fencing in case of disc I/O delays(maybe in some other cases). I expect that increasing timeout can mitigate clusters with disc I/O issues similar to ones described above. Please note this timeout is NOT: stonith-timeout or stonith-watchdog-timeout. Could you please comment if that is a meaningfull assumption and where does the 20sec timeout come from. Regards, Dmytro -------------- next part -------------- An HTML attachment was scrubbed... URL: