From robert.hoo.linux at gmail.com  Sat Jul  5 04:33:01 2025
From: robert.hoo.linux at gmail.com (Robert Hoo)
Date: Sat, 5 Jul 2025 12:33:01 +0800
Subject: [ClusterLabs] Q: GFS2 setup with pacemaker failed at LVM-activate
 resource create
Message-ID: <CA+wubQB0hAXOh+QqE_jsAGirr0b-KLzo=ovrj3SjdwZuH_JQuA@mail.gmail.com>

Hi,

I'm trying to setup GFS2 file system on a shared nvme storage, following
RHEL's manual Chapter 8. GFS2 file systems in a cluster | Configuring GFS2
file systems | Red Hat Enterprise Linux | 8 | Red Hat Documentation
<https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/configuring_gfs2_file_systems/assembly_configuring-gfs2-in-a-cluster-configuring-gfs2-file-systems>
Chap
8. (
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/configuring_gfs2_file_systems/assembly_configuring-gfs2-in-a-cluster-configuring-gfs2-file-systems
)

Failed at step 12:
pcs resource create sharedlv1 --group shared_vg1 ocf:heartbeat:LVM-activate
lvname=shared_lv1 vgname=shared_vg1 activation_mode=shared
vg_access_mode=lvmlockd

Is it because previous lvcreate step shouldn't "--activate sy"?

journal log shows:
notice: Transition 7 aborted by operation sharedlv1_monitor_0 'modify' on
node2: Event failed
notice: Transition 7 action 7 (sharedlv1_monitor_0 on node2): expected 'not
running' but got 'ok'
DEBUG: sharedlv1 monitor : 0
notice: Result of probe operation for sharedlv1 on node1: ok
notice: Transition 7 action 6 (sharedlv1_monitor_0 on node1):expected 'not
running' but got 'ok'
notice: Transition 7 (Complete=3, Pending=0, Fired=0, Skipped=0,
Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
error: Resource sharedlv1 is active on 2 nodes (attempting recovery)
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active
for more information
notice: * Move     sharedlv1 ( node1 -> node2 )
error: ...
[...]
notice: Initiating stop operation sharedlv1_stop_0 on node2
notice: Initiating stop operation sharedlv1_stop_0 locally on node1
notice: Requesting local execution of stop operation for sharedlv1 on node1
[...]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250705/0e84aec5/attachment.htm>

From pierrecharles.dussault at outlook.com  Mon Jul  7 19:12:19 2025
From: pierrecharles.dussault at outlook.com (Pierre C. Dussault)
Date: Mon, 7 Jul 2025 19:12:19 +0000
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
Message-ID: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>

Hi all,

I am trying to get a working fencing device on a single Proxmox 8 host (not using the Proxmox tools) with fence_virtd and fence_virt/vxm. I can't get the command
    # fence_xvm -o list
to output anything, it keeps failing via timeout despite many attempts at finding the fault. The exact return message is:
    Timed out waiting for response
    Operation failed

I am trying to configure it using the multicast Listener with the Libvirt backend. All settings were left to defaults except the listening interface which was set to the Linux bridge connecting the host and the guests. The fence_xvm.key file was copied in the /etc/cluster/ directory on the host and on the guests.

I followed this: https://projects.clusterlabs.org/w/fencing/guest_fencing/ which didn't work,
then this: https://kevwells.com/it-knowledge-base/how-to-install-cluster-fencing-using-libvert-on-kvm-virtual-machines/ which also didn't work.

I read the man pages for fence_virt, fence_xvm, fence_virtd and fence_virt.conf.
I read the README and doc files in "agents/virt" and "agent/virt/docs" from the source repository.

I am at a loss here. Is there a better guide out there (or more up to date)?

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250707/fba2a833/attachment.htm>

From mkelly at alteeve.com  Tue Jul  8 20:26:30 2025
From: mkelly at alteeve.com (Madison Kelly)
Date: Tue, 8 Jul 2025 16:26:30 -0400
Subject: [ClusterLabs] pcs problem
Message-ID: <e06037d3-ab72-4d74-9b6f-89ffab637071@alteeve.com>

Hi all,

 ? I'm trying to delete a resource, but pcs is throwing an odd error;

[root at vm-a01n01 ~]# pcs resource status
 ? * srv01-al10?? ??? (ocf:alteeve:server):???? Started vm-a01n01
 ? * *srv02-win11????? (ocf:alteeve:server):???? Stopped (disabled)*
 ? * srv03-win11????? (ocf:alteeve:server):???? Stopped (disabled)
 ? * srv04-win2025??? (ocf:alteeve:server):???? Started vm-a01n01
[root at vm-a01n01 ~]# *pcs resource delete srv02-win11*
Removing dependant elements:
 ? Location constraints: 'location-srv02-win11-vm-a01n01-200', 
'location-srv02-win11-vm-a01n02-100'
Stopping resource 'srv02-win11' before deleting
Error: Cannot load cluster status, xml does not describe valid cluster 
status: *Resource 'srv02-win11' contains an unknown role 'stopped'*
Error: Errors have occurred, therefore pcs is unable to continue
[root at vm-a01n01 ~]# rpm -q pcs
pcs-0.11.9-2.el9_6.1.x86_64

This is an Alma Linux 9.6 based host.

-- 
wiki -https://alteeve.com/w
cell - 647-471-0951
work - 647-417-7486 x 404
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250708/4568a011/attachment.htm>

From tojeline at redhat.com  Wed Jul  9 08:36:27 2025
From: tojeline at redhat.com (Tomas Jelinek)
Date: Wed, 9 Jul 2025 10:36:27 +0200
Subject: [ClusterLabs] pcs problem
In-Reply-To: <e06037d3-ab72-4d74-9b6f-89ffab637071@alteeve.com>
References: <e06037d3-ab72-4d74-9b6f-89ffab637071@alteeve.com>
Message-ID: <137b4853-2d4c-4108-8f83-f28ad9f70d80@redhat.com>

Hi Madison,

This issue partially originates in pacemaker. Pacemaker documents 
allowed target-role values to be capitalized, e.g. Stopped, Started, 
etc. So pcs expects the status xml produced by pacemaker to follow this 
and the target-role values to be in this form. Unfortunately, the 
reality doesn't match the documentation - pacemaker keeps the 
target-role value in whatever format entered by the user.

This is not an issue as long as 'pcs resource disable' command is used 
to set target-role, as that sets the value in the expected format. The 
problem happens when target-role is set by other means and the expected 
format is not followed.

The issue is tracked in RHEL-92043 
<https://issues.redhat.com/browse/RHEL-92043>. It has been fixed in pcs 
in 1f4db60 
<https://github.com/ClusterLabs/pcs/commit/1f4db601021a291fe229496dfc3a150a53dcc201> 
for pcs-0.12 branch and 945ecd9 
<https://github.com/ClusterLabs/pcs/commit/945ecd9a863d7a95ac6fc8ef4207c52b953b78a9> 
for pcs-0.11 branch. With this fix, pcs is case-insensitive when reading 
target-role from status xml. The fix will be included in the upcoming 
upstream releases 0.12.1 and 0.11.10

The fix has also been released in RHEL packages pcs-0.12.0-3.el10_0.2 
and pcs-0.11.9-2.el9_6.1 and upgrading to those should resolve the 
issue. However, upon checking Alma Linux pcs packages, it looks like 
Alma Linux decided not to include the fix in their pcs-0.11.9-2.el9_6.1 
package.

As a workaround, set the target-role meta attribute of resources in CIB 
to one of Started, Stopped, Promoted, Unpromoted.

Regards,
Tomas


Dne 08. 07. 25 v 22:26 Madison Kelly napsal(a):
>
> Hi all,
>
> ? I'm trying to delete a resource, but pcs is throwing an odd error;
>
> [root at vm-a01n01 ~]# pcs resource status
> ? * srv01-al10?? ??? (ocf:alteeve:server):???? Started vm-a01n01
> ? * *srv02-win11????? (ocf:alteeve:server):???? Stopped (disabled)*
> ? * srv03-win11????? (ocf:alteeve:server):???? Stopped (disabled)
> ? * srv04-win2025??? (ocf:alteeve:server):???? Started vm-a01n01
> [root at vm-a01n01 ~]# *pcs resource delete srv02-win11*
> Removing dependant elements:
> ? Location constraints: 'location-srv02-win11-vm-a01n01-200', 
> 'location-srv02-win11-vm-a01n02-100'
> Stopping resource 'srv02-win11' before deleting
> Error: Cannot load cluster status, xml does not describe valid cluster 
> status: *Resource 'srv02-win11' contains an unknown role 'stopped'*
> Error: Errors have occurred, therefore pcs is unable to continue
> [root at vm-a01n01 ~]# rpm -q pcs
> pcs-0.11.9-2.el9_6.1.x86_64
>
> This is an Alma Linux 9.6 based host.
>
> -- 
> wiki -https://alteeve.com/w
> cell - 647-471-0951
> work - 647-417-7486 x 404
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home:https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250709/ae56f782/attachment.htm>

From mlisik at redhat.com  Wed Jul  9 10:08:06 2025
From: mlisik at redhat.com (Miroslav Lisik)
Date: Wed, 9 Jul 2025 12:08:06 +0200
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
Message-ID: <9c493531-d966-4a23-9ea6-811d060eba40@redhat.com>

Hi Pierre,

you could try to double check the /etc/cluster/fence_xvm.key if it is 
really the same on the host and the guests.
Then you could try the omping tool to test that multicast is working 
correctly.

Regards,
Miroslav

On 7/7/25 21:12, Pierre C. Dussault wrote:
> Hi all,
>
> I am trying to get a working fencing device on a single Proxmox 8 host 
> (not using the Proxmox tools) with fence_virtd and fence_virt/vxm. I 
> can't get the command
> ? ? # fence_xvm -o list
> to output anything, it keeps failing via timeout despite many attempts 
> at finding the fault. The exact return message is:
> ? ? Timed out waiting for response
> ? ? Operation failed
>
> I am trying to configure it using the multicast Listener with the 
> Libvirt backend. All settings were left to defaults except the 
> listening interface which was set to the Linux bridge connecting the 
> host and the guests. The fence_xvm.key file was copied in the 
> /etc/cluster/ directory on the host and on the guests.
>
> I followed this: 
> https://projects.clusterlabs.org/w/fencing/guest_fencing/ 
> <https://projects.clusterlabs.org/w/fencing/guest_fencing/>?which 
> didn't work,
> then this: 
> https://kevwells.com/it-knowledge-base/how-to-install-cluster-fencing-using-libvert-on-kvm-virtual-machines/ 
> <https://kevwells.com/it-knowledge-base/how-to-install-cluster-fencing-using-libvert-on-kvm-virtual-machines/>?which 
> also didn't work.
>
> I read the man pages for fence_virt, fence_xvm, fence_virtd and 
> fence_virt.conf.
> I read the README and doc files in "agents/virt" and "agent/virt/docs" 
> from the source repository.
>
> I am at a loss here. Is there a better guide out there (or more up to 
> date)?
>
> Thanks.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


From nwahl at redhat.com  Thu Jul 10 20:44:00 2025
From: nwahl at redhat.com (Reid Wahl)
Date: Thu, 10 Jul 2025 13:44:00 -0700
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
Message-ID: <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>

On Mon, Jul 7, 2025 at 12:12?PM Pierre C. Dussault
<pierrecharles.dussault at outlook.com> wrote:
>
> Hi all,
>
> I am trying to get a working fencing device on a single Proxmox 8 host (not using the Proxmox tools) with fence_virtd and fence_virt/vxm. I can't get the command
>     # fence_xvm -o list
> to output anything, it keeps failing via timeout despite many attempts at finding the fault. The exact return message is:
>     Timed out waiting for response
>     Operation failed
>
> I am trying to configure it using the multicast Listener with the Libvirt backend. All settings were left to defaults except the listening interface which was set to the Linux bridge connecting the host and the guests. The fence_xvm.key file was copied in the /etc/cluster/ directory on the host and on the guests.
>
> I followed this: https://projects.clusterlabs.org/w/fencing/guest_fencing/ which didn't work,
> then this: https://kevwells.com/it-knowledge-base/how-to-install-cluster-fencing-using-libvert-on-kvm-virtual-machines/ which also didn't work.
>
> I read the man pages for fence_virt, fence_xvm, fence_virtd and fence_virt.conf.
> I read the README and doc files in "agents/virt" and "agent/virt/docs" from the source repository.
>
> I am at a loss here. Is there a better guide out there (or more up to date)?
>
> Thanks.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

Can you try with the firewall disabled on both the host and the
guests? If it works, then we know it's a firewall issue. You probably
need to allow traffic through 1229/udp on the host, in addition to
1229/tcp on the guests, if you are not already doing so. (Not sure if
1229/tcp is needed on the host.)

You can also try with SELinux (or AppArmor or whatever) disabled or
not-enforcing.

I haven't configured or troubleshot fence_xvm or fence_virt in a long
time. Firewall issues have been the most common problem for me though.

-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker


From clumens at redhat.com  Thu Jul 10 20:51:57 2025
From: clumens at redhat.com (Chris Lumens)
Date: Thu, 10 Jul 2025 16:51:57 -0400
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
Message-ID: <aHAn7WAlsq4Ws1_H@spire>

>I am trying to get a working fencing device on a single Proxmox 8 host (not using the Proxmox tools) with fence_virtd and fence_virt/vxm. I can't get the command
>    # fence_xvm -o list
>to output anything, it keeps failing via timeout despite many attempts at finding the fault. The exact return message is:
>    Timed out waiting for response
>    Operation failed

If it makes you feel any better, I work on pacemaker and I somewhat
frequently have this problem too.  These are what my notes in my own
personal wiki (vim wiki syntax here) have to say:

= Debugging Fencing =

Fencing with all the VMs and their networking sure is annoying.  The easiest way to make sure
fencing is working is by running the following on both the cluster nodes and the host:

     {{{
     [root at spire audit]# fence_xvm -o list -a 239.255.100.100
     cluster01                        a9e4b6ca-120f-42d5-b5a2-53212d256791 off
     cluster02                        6e0069cd-d7fb-4efc-8473-379095fbdd5f off
     cluster03                        ff3ff2bd-0e3a-4207-9be8-5323cc3f7a87 off
     ctslab-exec                      275f000c-c83f-4b85-a9fb-4b7587836ceb on
     ctslab1                          da19d332-c92b-4d88-8af2-2d8e38fba667 on
     ctslab2                          0eb3f87c-1205-4304-ba33-c0149aac8e3d on
     }}}

If you don't get any output and it takes forever before timing out, here's some things to check:

* Disable firewalld on the host and on the VMs.  It can be made to work, but it's obnoxious and I
   hate it.

* Check `/var/log/audit/audit.log` to make sure there's no selinux problems.  If there are, use
   `audit2allow` to generate policy.  The man pages are helpful here.

* Make sure `/etc/cluster/fence_xvm.key` is the same on the host and all the VMs.  If not, copy it
   over.  Make sure to run `restorecon -Rv /etc/cluster` afterwards.

* Restart `fence_virtd` on the host after installing a new VM or reinstalling an old one.


- Chris


From xliang at suse.com  Tue Jul 15 05:52:19 2025
From: xliang at suse.com (Xin Liang)
Date: Tue, 15 Jul 2025 13:52:19 +0800
Subject: [ClusterLabs] Release crmsh 5.0.0-rc2
Message-ID: <CAAfvqY4P5q-+SHo20ocjMQf3QeK5gDGo2pBrHTUMLF6TiQ=c1A@mail.gmail.com>

Hello everyone,

I am pleased to announce that crmsh 5.0.0-rc2 is now available!

The new features since crmsh 5.0.0-rc1 include:

   - Dev: migration: use atomic write to modify corosync.conf on remote
   nodes
   - Dev: migration: allow to run migration locally

The major changes since 5.0.0-rc1 include:

   - Dev: Dockerfile: Install pacemaker-remote package
   - Dev: bootstrap: Improve configuration for admin IP
   - Fix: bootstrap: add sleeps to avoid triggering sshd PerSourcePenalties
   (bsc#1243141)
   - Dev: bootstrap: Improve node removal handling and messaging
   - corosync set command improvement
   - Fix: report.collect: Detect log existence before using it (bsc#1244515)
   - Fix: bootstrap: setup_passwordless_with_other_nodes does not update
   the authorized_keys on localhost (bsc#1244314)
   - Dev: cibconfig: Prevent adding Pacemaker remote resources to groups,
   orders, or colocations
   - Fix: bootstrap: Reload corosync after sync corosync.conf (bsc#1244437)
   - Dev: provide a friendly message when passwordless ssh does not work
   (bsc#1244525)
   - Dev: run-functional-tests: Fetch container's IP address correctly
   - Fix: crash_test: Correctly retrieve fence event information
   (bsc#1243786)
   - Dev: bootstrap: Remove user at host item from /root/.config/crm/crm.conf
   when removing node
   - Dev: Prevent actions when offline nodes are unreachable
   - Fix: log: Improve function confirm's logic (bsc#1245386)
   - Fix: bootstrap: Refine qnetd passwordless configuration logic
   (bsc#1245387)
   - Fix: bootstrap: should fallback to default user when core.hosts is not
   availabe from the seed node (bsc#1245343)
   - Dev: corosync: Get value from runtime.config prefix and update default
   token value
   - Dev: ui_cluster: Enhance membership validation


You can find more in the release notes here:
https://github.com/ClusterLabs/crmsh/releases/tag/5.0.0-rc2

Thank you to everyone who contributed to this release.
Your feedback and suggestions are always welcome!

Best regards,
xin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250715/10f5df23/attachment.htm>

From mpospisi at redhat.com  Thu Jul 17 16:35:04 2025
From: mpospisi at redhat.com (=?UTF-8?B?TWljaGFsIFBvc3DDrcWhaWw=?=)
Date: Thu, 17 Jul 2025 18:35:04 +0200
Subject: [ClusterLabs] pcs 0.12.1 released
Message-ID: <CAGd5o6RSMrBy8hXXVJy_uyNcPM-hiYJOy3EF54kL7Gz7tDQqGg@mail.gmail.com>

I am happy to announce the latest release of pcs, version 0.12.1.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.12.1.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.12.1.zip

This release contains a regression fix for commands that delete
resources, booth resources, stonith devices and remote nodes. The
commands would fail when capitalization of `target-role` meta attribute
value didn't follow the Pacemaker specification. The booth command has
been broken since 0.12.0 and the remaining commands since 0.12.0b1.

On the news functionality front, pcs now allows you to rename clusters
with the new `cluster rename` command. Also, exporting cluster
configuration was further expanded to alerts, node utilization and
attributes.

Configuration errors not caught by the CIB schema are now easier to spot
as pcs runs additional CIB checks in `cluster cib-push` and `cluster
edit` commands. These warnings are now also part of the `pcs status`
output.

The `resource meta` and `stonith meta` commands were fully
reimplemented. The `resource meta` command also now supports bundle
resources. Additional checks were added to prevent inadvertent updates
of `remote-node` and `remote-addr` meta attributes.

Complete change log for this release:
### Added
- Commands `pcs cluster cib-push` and `pcs cluster edit` now print more
  info when new CIB does not conform to the CIB schema ([RHEL-63186])
- Commands `pcs cluster cib-push` and `pcs cluster edit` now print info
  about problems in pushed CIB even if it conforms to the CIB schema
  ([RHEL-7681])
- Command `pcs cluster rename` for changing cluster name ([RHEL-22423])
- Command `pcs stonith sbd watchdog list` now prints watchdogs' identity
  and driver ([RHEL-76176])
- Prevent removing or disabling stonith devices or disabling SBD if the
  cluster would be left with disabled SBD and no stonith devices
  ([RHEL-66607])
- Support for exporting alerts in `json` and `cmd` formats ([RHEL-44347])
- Output of `pcs status` now contains messages about CIB
  misconfiguration provided by `crm_verify` pacemaker tool ([RHEL-7681])
- Support for bundle resources in `pcs resource meta`, disallow updating
  `remote-node` and `remote-addr` without `--force`, add lib command
  `resource.update_meta` to API v2 ([RHEL-35407])
- Support for exporting node attributes and utilization in `json` and
  `cmd` formats ([RHEL-21050])
- Support for reading sate and logs directories from systemd environment
  variables `STATE_DIRECTORY` and `LOGS_DIRECTORY` ([RHEL-96074])

### Fixed
- Fixed a traceback when removing a resource fails in web UI
- It is now possible to override errors when editing cluster properties
  in web UI
- Display node-attribute in colocation constraints configuration
  ([RHEL-81938])
- Fixed cluster status parsing when the `target-role` meta attribute is
  not properly capitalized as defined by pacemaker specification.
  Affected commands:
    - `pcs resource|stonith delete|remove`, `pcs booth delete|remove`,
      and `pcs cluster node delete-remote|remove-remote` (broken since
      0.12.0) ([RHEL-92043])
    - `pcs status query resource` (broken since 0.12.0)
- Handle query limit errors coming from rubygem-rack ([RHEL-90151])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Michal Posp??il, Miroslav Lisik, Peter Romancik
and Tomas Jelinek.

Cheers,
Michal


[RHEL-7681]: https://issues.redhat.com/browse/RHEL-7681
[RHEL-21050]: https://issues.redhat.com/browse/RHEL-21050
[RHEL-22423]: https://issues.redhat.com/browse/RHEL-22423
[RHEL-35420]: https://issues.redhat.com/browse/RHEL-35407
[RHEL-44347]: https://issues.redhat.com/browse/RHEL-44347
[RHEL-63186]: https://issues.redhat.com/browse/RHEL-63186
[RHEL-66607]: https://issues.redhat.com/browse/RHEL-66607
[RHEL-76176]: https://issues.redhat.com/browse/RHEL-76176
[RHEL-81938]: https://issues.redhat.com/browse/RHEL-81938
[RHEL-90151]: https://issues.redhat.com/browse/RHEL-90151
[RHEL-92043]: https://issues.redhat.com/browse/RHEL-92043
[RHEL-96074]: https://issues.redhat.com/browse/RHEL-96074


From mpospisi at redhat.com  Thu Jul 17 16:56:43 2025
From: mpospisi at redhat.com (=?UTF-8?B?TWljaGFsIFBvc3DDrcWhaWw=?=)
Date: Thu, 17 Jul 2025 18:56:43 +0200
Subject: [ClusterLabs] pcs 0.11.10 released
Message-ID: <CAGd5o6RVJNWp4QUW+82k-4fvAHXUnBPhhdYG6Jqpem6OhwHsMg@mail.gmail.com>

I am happy to announce the latest release of pcs, version 0.11.10.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.10.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.10.zip

This is a backport of the 0.12.1 release [1].

Complete change log for this release:
### Added
- Commands `pcs cluster cib-push` and `pcs cluster edit` now print more
  info when new CIB does not conform to the CIB schema ([RHEL-76059])
- Commands `pcs cluster cib-push` and `pcs cluster edit` now print info
  about problems in pushed CIB even if it conforms to the CIB schema
  ([RHEL-76060])
- Command `pcs stonith sbd watchdog list` now prints watchdogs' identity
  and driver ([RHEL-76177])
- Command `pcs cluster rename` for changing cluster name ([RHEL-76055])
- Prevent removing or disabling stonith devices or disabling SBD if the
  cluster would be left with disabled SBD and no stonith devices
  ([RHEL-76170])
- Support for exporting alerts in `json` and `cmd` formats
  ([RHEL-76153])
- Output of `pcs status` now contains messages about CIB
  misconfiguration provided by `crm_verify` pacemaker tool
  ([RHEL-76060])
- Support for bundle resources in `pcs resource meta`, disallow updating
  `remote-node` and `remote-addr` without `--force`, add lib command
  `resource.update_meta` to API v2 ([RHEL-35420])
- Support for exporting node attributes and utilization in `json` and
  `cmd` formats ([RHEL-76154])
- Support for reading sate and logs directories from systemd environment
  variables `STATE_DIRECTORY` and `LOGS_DIRECTORY` ([RHEL-97220])

### Fixed
- Fixed a traceback when removing a resource fails in web UI
- It is now possible to override errors when editing cluster properties
  in web UI
- Display node-attribute in colocation constraints configuration
  ([RHEL-82894])
- Fixed cluster status parsing when the `target-role` meta attribute is
  not properly capitalized as defined by pacemaker specification.
  Affected commands:
    - `pcs resource|stonith delete|remove`, `pcs booth delete|remove`,
      and `pcs cluster node delete-remote|remove-remote` (broken since
      0.11.9) ([RHEL-92044])
    - `pcs status query resource` (broken since 0.11.8)
- Handle query limit errors coming from rubygem-rack ([RHEL-90151])

Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Michal Posp??il, Miroslav Lisik, Peter Romancik
and Tomas Jelinek.

Cheers,
Michal


[1] https://lists.clusterlabs.org/pipermail/users/2025-July/036604.html
[RHEL-35420]: https://issues.redhat.com/browse/RHEL-35420
[RHEL-76055]: https://issues.redhat.com/browse/RHEL-76055
[RHEL-76059]: https://issues.redhat.com/browse/RHEL-76059
[RHEL-76060]: https://issues.redhat.com/browse/RHEL-76060
[RHEL-76153]: https://issues.redhat.com/browse/RHEL-76153
[RHEL-76154]: https://issues.redhat.com/browse/RHEL-76154
[RHEL-76170]: https://issues.redhat.com/browse/RHEL-76170
[RHEL-76177]: https://issues.redhat.com/browse/RHEL-76177
[RHEL-82894]: https://issues.redhat.com/browse/RHEL-82894
[RHEL-90151]: https://issues.redhat.com/browse/RHEL-90151
[RHEL-92044]: https://issues.redhat.com/browse/RHEL-92044
[RHEL-97220]: https://issues.redhat.com/browse/RHEL-97220


From clumens at redhat.com  Mon Jul 21 13:37:10 2025
From: clumens at redhat.com (Chris Lumens)
Date: Mon, 21 Jul 2025 09:37:10 -0400
Subject: [ClusterLabs] Pacemaker 2.1.10 now available
Message-ID: <aH5ChtIjmc0eXP2H@spire>

Hi all,

I am happy to announce that the source code for the final release of
Pacemaker version 2.1.10 is now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.10

There is only one extremely minor difference from -rc1: a fix for
building without gnutls is included.  For more details about changes in
this release, please see the changelog:

https://github.com/ClusterLabs/pacemaker/blob/2.1/ChangeLog

Everyone is encouraged to download, compile, and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors of source code to this release,
including Athos Ribeiro, Gao,Yan, Ken Gaillot, Klaus Wenninger, and Reid
Wahl.

3.0.1-rc2 will also be released soon.

- Chris


From clumens at redhat.com  Tue Jul 22 13:30:35 2025
From: clumens at redhat.com (Chris Lumens)
Date: Tue, 22 Jul 2025 09:30:35 -0400
Subject: [ClusterLabs] Pacemaker 3.0.1-rc2 now available
Message-ID: <aH-SezvhOAKAZR68@spire>

Hi all,

I am happy to announce that the source code for the second release
candidate for Pacemaker version 3.0.1 is now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-3.0.1-rc2

Aside from documentation improvements, there are no user-facing changes
in this release candidate.  For more details about changes in this
release, please see the changelog:

https://github.com/ClusterLabs/pacemaker/blob/3.0/ChangeLog.md

Everyone is encouraged to download, compile, and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

I expect to do the final release in about one week.

Many thanks to all contributors of source code to this release,
including Aleksei Burlakov, Athos Ribeiro, Gao,Yan, Hideo Yamauchi, Ken
Gaillot, Klaus Wenninger, Reid Wahl, Renan Rodrigo, Satomi OSAWA, Thomas
Jones, WhiteWLf-dev, and xin liang

- Chris


From pierrecharles.dussault at outlook.com  Thu Jul 24 04:47:16 2025
From: pierrecharles.dussault at outlook.com (Pierre C. Dussault)
Date: Thu, 24 Jul 2025 04:47:16 +0000
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
Message-ID: <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>

Hi Reid,

Thanks for the feedback and suggestions. Sorry for delayed answer, I was away on vacation.

For now I'll try to use SBD fencing since I haven't been able to figure out the issue. It definitely wasn't nftables since I disabled it prior to doing configurations. I haven't touched any of the configuration in Proxmox for Apparmor, so that's running in its default settings. It seems some programs have an active profile in enforcing mode, but it doesn't seem like they are programs that would interact with KVM (although I may be wrong). I'll try to focus my efforts on SBD fencing and I'll circle back to fence_vxm once I get it working with SBD.

Thanks again,
Pierre

________________________________
From: Reid Wahl <nwahl at redhat.com>
Sent: Thursday, July 10, 2025 4:44 PM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>; pierrecharles.dussault at outlook.com <pierrecharles.dussault at outlook.com>
Subject: Re: [ClusterLabs] Fencing agent fence_xvm using multicast

On Mon, Jul 7, 2025 at 12:12?PM Pierre C. Dussault
<pierrecharles.dussault at outlook.com> wrote:
>
> Hi all,
>
> I am trying to get a working fencing device on a single Proxmox 8 host (not using the Proxmox tools) with fence_virtd and fence_virt/vxm. I can't get the command
>     # fence_xvm -o list
> to output anything, it keeps failing via timeout despite many attempts at finding the fault. The exact return message is:
>     Timed out waiting for response
>     Operation failed
>
> I am trying to configure it using the multicast Listener with the Libvirt backend. All settings were left to defaults except the listening interface which was set to the Linux bridge connecting the host and the guests. The fence_xvm.key file was copied in the /etc/cluster/ directory on the host and on the guests.
>
> I followed this: https://projects.clusterlabs.org/w/fencing/guest_fencing/ which didn't work,
> then this: https://kevwells.com/it-knowledge-base/how-to-install-cluster-fencing-using-libvert-on-kvm-virtual-machines/ which also didn't work.
>
> I read the man pages for fence_virt, fence_xvm, fence_virtd and fence_virt.conf.
> I read the README and doc files in "agents/virt" and "agent/virt/docs" from the source repository.
>
> I am at a loss here. Is there a better guide out there (or more up to date)?
>
> Thanks.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

Can you try with the firewall disabled on both the host and the
guests? If it works, then we know it's a firewall issue. You probably
need to allow traffic through 1229/udp on the host, in addition to
1229/tcp on the guests, if you are not already doing so. (Not sure if
1229/tcp is needed on the host.)

You can also try with SELinux (or AppArmor or whatever) disabled or
not-enforcing.

I haven't configured or troubleshot fence_xvm or fence_virt in a long
time. Firewall issues have been the most common problem for me though.

--
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250724/5e4a985b/attachment.htm>

From kwenning at redhat.com  Thu Jul 24 15:56:31 2025
From: kwenning at redhat.com (Klaus Wenninger)
Date: Thu, 24 Jul 2025 17:56:31 +0200
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
 <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>
Message-ID: <CALrDAo1DCnORMM1Q-jEZ3r2Fud8_GOcWF7o4SZd7cPco5g7PDQ@mail.gmail.com>

On Thu, Jul 24, 2025 at 6:47?AM Pierre C. Dussault <
pierrecharles.dussault at outlook.com> wrote:

> Hi Reid,
>
> Thanks for the feedback and suggestions. Sorry for delayed answer, I was
> away on vacation.
>
> For now I'll try to use SBD fencing since I haven't been able to figure
> out the issue. It definitely wasn't nftables since I disabled it prior to
> doing configurations. I haven't touched any of the configuration in Proxmox
> for Apparmor, so that's running in its default settings. It seems some
> programs have an active profile in enforcing mode, but it doesn't seem like
> they are programs that would interact with KVM (although I may be wrong).
> I'll try to focus my efforts on SBD fencing and I'll circle back to
> fence_vxm once I get it working with SBD.
>

If you have a single hypervisor where you have access to - some sort of at
least - going with SBD will
probably give you more issues than it will help you.
I'm using SBD on qemu-kvm either with i6300 or q35 virtual watchdog both
for watchdog-fencing
and for poison-pill-fencing. Good thing is that with qemu-kvm (guess
proxmox is still running
on top of qemu-kvm) you have at least a proper virtual watchdog (e.g. <watchdog
model='i6300esb' action='reset'/>
in the <devices>) For the case you wanna go with poison-pill-fencing that
works well using a
shared disk-image - no need for setting up an iSCSI-target or something -
using something like
<disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='SHARED_IMAGE_FILE_A'/>
      <target dev='sdb' bus='scsi'/>
      <serial>SBD-A</serial>
      <shareable/>
 </disk>
But again I would encourage you to try something different unless you need
any of the
points where SBD shines. I'm using it in this kind of environment as I'm
working
on SBD development.

Regards,
Klaus

> Thanks again,
> Pierre
>
> ------------------------------
> *From:* Reid Wahl <nwahl at redhat.com>
> *Sent:* Thursday, July 10, 2025 4:44 PM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>; pierrecharles.dussault at outlook.com <
> pierrecharles.dussault at outlook.com>
> *Subject:* Re: [ClusterLabs] Fencing agent fence_xvm using multicast
>
> On Mon, Jul 7, 2025 at 12:12?PM Pierre C. Dussault
> <pierrecharles.dussault at outlook.com> wrote:
> >
> > Hi all,
> >
> > I am trying to get a working fencing device on a single Proxmox 8 host
> (not using the Proxmox tools) with fence_virtd and fence_virt/vxm. I can't
> get the command
> >     # fence_xvm -o list
> > to output anything, it keeps failing via timeout despite many attempts
> at finding the fault. The exact return message is:
> >     Timed out waiting for response
> >     Operation failed
> >
> > I am trying to configure it using the multicast Listener with the
> Libvirt backend. All settings were left to defaults except the listening
> interface which was set to the Linux bridge connecting the host and the
> guests. The fence_xvm.key file was copied in the /etc/cluster/ directory on
> the host and on the guests.
> >
> > I followed this:
> https://projects.clusterlabs.org/w/fencing/guest_fencing/ which didn't
> work,
> > then this:
> https://kevwells.com/it-knowledge-base/how-to-install-cluster-fencing-using-libvert-on-kvm-virtual-machines/
> which also didn't work.
> >
> > I read the man pages for fence_virt, fence_xvm, fence_virtd and
> fence_virt.conf.
> > I read the README and doc files in "agents/virt" and "agent/virt/docs"
> from the source repository.
> >
> > I am at a loss here. Is there a better guide out there (or more up to
> date)?
> >
> > Thanks.
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> Can you try with the firewall disabled on both the host and the
> guests? If it works, then we know it's a firewall issue. You probably
> need to allow traffic through 1229/udp on the host, in addition to
> 1229/tcp on the guests, if you are not already doing so. (Not sure if
> 1229/tcp is needed on the host.)
>
> You can also try with SELinux (or AppArmor or whatever) disabled or
> not-enforcing.
>
> I haven't configured or troubleshot fence_xvm or fence_virt in a long
> time. Firewall issues have been the most common problem for me though.
>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250724/8287152f/attachment.htm>

From jgdr at dalibo.com  Mon Jul 28 10:46:36 2025
From: jgdr at dalibo.com (Jehan-Guillaume de Rorthais)
Date: Mon, 28 Jul 2025 12:46:36 +0200
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <CALrDAo1DCnORMM1Q-jEZ3r2Fud8_GOcWF7o4SZd7cPco5g7PDQ@mail.gmail.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
 <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CALrDAo1DCnORMM1Q-jEZ3r2Fud8_GOcWF7o4SZd7cPco5g7PDQ@mail.gmail.com>
Message-ID: <20250728124636.53eb926c@karst>

Hi Klaus,

On Thu, 24 Jul 2025 17:56:31 +0200
Klaus Wenninger <kwenning at redhat.com> wrote:

> [?]
> If you have a single hypervisor where you have access to - some sort of at
> least - going with SBD will probably give you more issues than it will help
> you.
> [?]
> But again I would encourage you to try something different unless you need
> any of the points where SBD shines.

I would be interested if you could you elaborate a bit on that?

Is it that SBD for watchdog self-fencing only architecture is considered
instable or insecure? How would it be?

And - appart from the single hv node - what's wrong with SBD on
"virtualized" raw shared storage? Any bad field experience?

Thanks!

From kwenning at redhat.com  Mon Jul 28 12:10:04 2025
From: kwenning at redhat.com (Klaus Wenninger)
Date: Mon, 28 Jul 2025 14:10:04 +0200
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <20250728124636.53eb926c@karst>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
 <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CALrDAo1DCnORMM1Q-jEZ3r2Fud8_GOcWF7o4SZd7cPco5g7PDQ@mail.gmail.com>
 <20250728124636.53eb926c@karst>
Message-ID: <CALrDAo1osMAQdsn=Ttc_i1MRBJJAgq0yXkN3b=Jch0wF3-LOqA@mail.gmail.com>

On Mon, Jul 28, 2025 at 12:56?PM Jehan-Guillaume de Rorthais <
jgdr at dalibo.com> wrote:

> Hi Klaus,
>
> On Thu, 24 Jul 2025 17:56:31 +0200
> Klaus Wenninger <kwenning at redhat.com> wrote:
>
> > [?]
> > If you have a single hypervisor where you have access to - some sort of
> at
> > least - going with SBD will probably give you more issues than it will
> help
> > you.
> > [?]
> > But again I would encourage you to try something different unless you
> need
> > any of the points where SBD shines.
>
> I would be interested if you could you elaborate a bit on that?
>
> Is it that SBD for watchdog self-fencing only architecture is considered
> instable or insecure? How would it be?
>

No, given you have a reliable watchdog and the timeouts are configured
properly SBD should be safe - both with and without shared disks.
The shared disks don't add additional safety actually because SBD
anyway has to rely on the watchdog taking the node down reliably
shouldn't it be able to access the disk(s) anymore.


>
> And - appart from the single hv node - what's wrong with SBD on
> "virtualized" raw shared storage? Any bad field experience?
>

Nothing is basically wrong. Of course a reliable watchdog might be
an issue in virtual environments and a fallback to softdog will never
give you the reliability of a piece of hardware ticking down independently
from CPU and everything.
What I meant was that if you are running all your VMs on a single
hypervisor there is really no need to be able to cope with a split-network
szenario or anything like this. So why add something additional that needs
careful arrangement of timeouts, possibly disk(s), ... if your
hypervisor already offers an interface that allows you to control a
VM and that gives you reliable feedback of the status and which is
probably roughly as available as the hypervisor itself.

Regards,
Klaus


>
> Thanks!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250728/36f97e42/attachment.htm>

From jgdr at dalibo.com  Mon Jul 28 12:40:20 2025
From: jgdr at dalibo.com (Jehan-Guillaume de Rorthais)
Date: Mon, 28 Jul 2025 14:40:20 +0200
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <CALrDAo1osMAQdsn=Ttc_i1MRBJJAgq0yXkN3b=Jch0wF3-LOqA@mail.gmail.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
 <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CALrDAo1DCnORMM1Q-jEZ3r2Fud8_GOcWF7o4SZd7cPco5g7PDQ@mail.gmail.com>
 <20250728124636.53eb926c@karst>
 <CALrDAo1osMAQdsn=Ttc_i1MRBJJAgq0yXkN3b=Jch0wF3-LOqA@mail.gmail.com>
Message-ID: <20250728144020.48ea2a86@karst>

On Mon, 28 Jul 2025 14:10:04 +0200
Klaus Wenninger <kwenning at redhat.com> wrote:

> On Mon, Jul 28, 2025 at 12:56?PM Jehan-Guillaume de Rorthais <
> jgdr at dalibo.com> wrote:  
> 
> > Hi Klaus,
> >
> > On Thu, 24 Jul 2025 17:56:31 +0200
> > Klaus Wenninger <kwenning at redhat.com> wrote:
> >  
> > > [?]
> > > If you have a single hypervisor where you have access to - some sort of  
> > > at least - going with SBD will probably give you more issues than it will
> > > help you.
> > > [?]
> > > But again I would encourage you to try something different unless you  
> > > need any of the points where SBD shines.  
> >
> > I would be interested if you could you elaborate a bit on that?
> >
> > Is it that SBD for watchdog self-fencing only architecture is considered
> > instable or insecure? How would it be?
> 
> No, given you have a reliable watchdog and the timeouts are configured
> properly SBD should be safe - both with and without shared disks.

Ok, thank.

> The shared disks don't add additional safety actually because SBD
> anyway has to rely on the watchdog taking the node down reliably
> shouldn't it be able to access the disk(s) anymore.

My understanding is that SBD with shared disk is interesting:

* in shared disk cluster scenario
* to have faster cluster reactions in some circumstances

I should probably get back to the second point though, as I'm not really sure
about it.

> > And - appart from the single hv node - what's wrong with SBD on
> > "virtualized" raw shared storage? Any bad field experience?
> 
> Nothing is basically wrong. Of course a reliable watchdog might be
> an issue in virtual environments and a fallback to softdog will never
> give you the reliability of a piece of hardware ticking down independently
> from CPU and everything.

Check.

> What I meant was that if you are running all your VMs on a single
> hypervisor there is really no need to be able to cope with a split-network
> szenario or anything like this. So why add something additional that needs
> careful arrangement of timeouts, possibly disk(s), ... if your
> hypervisor already offers an interface that allows you to control a
> VM and that gives you reliable feedback of the status and which is
> probably roughly as available as the hypervisor itself.

Well, OK, that's was my understanding as well. I was curious I was missing
something else ?

Thank you for the details!

Have a good day,

From kwenning at redhat.com  Mon Jul 28 13:29:06 2025
From: kwenning at redhat.com (Klaus Wenninger)
Date: Mon, 28 Jul 2025 15:29:06 +0200
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <20250728144020.48ea2a86@karst>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
 <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CALrDAo1DCnORMM1Q-jEZ3r2Fud8_GOcWF7o4SZd7cPco5g7PDQ@mail.gmail.com>
 <20250728124636.53eb926c@karst>
 <CALrDAo1osMAQdsn=Ttc_i1MRBJJAgq0yXkN3b=Jch0wF3-LOqA@mail.gmail.com>
 <20250728144020.48ea2a86@karst>
Message-ID: <CALrDAo1Lg0M8QqGj0=NT033Ekya5FteJCRsy4D+a1sWD89eEcg@mail.gmail.com>

On Mon, Jul 28, 2025 at 2:40?PM Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
wrote:

> On Mon, 28 Jul 2025 14:10:04 +0200
> Klaus Wenninger <kwenning at redhat.com> wrote:
>
> > On Mon, Jul 28, 2025 at 12:56?PM Jehan-Guillaume de Rorthais <
> > jgdr at dalibo.com> wrote:
> >
> > > Hi Klaus,
> > >
> > > On Thu, 24 Jul 2025 17:56:31 +0200
> > > Klaus Wenninger <kwenning at redhat.com> wrote:
> > >
> > > > [?]
> > > > If you have a single hypervisor where you have access to - some sort
> of
> > > > at least - going with SBD will probably give you more issues than it
> will
> > > > help you.
> > > > [?]
> > > > But again I would encourage you to try something different unless
> you
> > > > need any of the points where SBD shines.
> > >
> > > I would be interested if you could you elaborate a bit on that?
> > >
> > > Is it that SBD for watchdog self-fencing only architecture is
> considered
> > > instable or insecure? How would it be?
> >
> > No, given you have a reliable watchdog and the timeouts are configured
> > properly SBD should be safe - both with and without shared disks.
>
> Ok, thank.
>
> > The shared disks don't add additional safety actually because SBD
> > anyway has to rely on the watchdog taking the node down reliably
> > shouldn't it be able to access the disk(s) anymore.
>
> My understanding is that SBD with shared disk is interesting:
>
> * in shared disk cluster scenario
> * to have faster cluster reactions in some circumstances
>

In some circumstances is true ;-)
In general the fencing side will have to wait because it might fall back to
the
device being taken down by the watchdog and that isn't any faster as with
watchdog-fencing. If the target is able to read the poison-pill it will
probably
reboot kind of instantaneously. But the fencing side will still have to
wait.
Probably not even the node coming back will speed up things as fencing
will still be pending. But of course the time in between can be used for
startup of the fenced node and it will be available to run services - if a
reboot recovers it.

Regards,
Klaus

>
> I should probably get back to the second point though, as I'm not really
> sure
> about it.
>
> > > And - appart from the single hv node - what's wrong with SBD on
> > > "virtualized" raw shared storage? Any bad field experience?
> >
> > Nothing is basically wrong. Of course a reliable watchdog might be
> > an issue in virtual environments and a fallback to softdog will never
> > give you the reliability of a piece of hardware ticking down
> independently
> > from CPU and everything.
>
> Check.
>
> > What I meant was that if you are running all your VMs on a single
> > hypervisor there is really no need to be able to cope with a
> split-network
> > szenario or anything like this. So why add something additional that
> needs
> > careful arrangement of timeouts, possibly disk(s), ... if your
> > hypervisor already offers an interface that allows you to control a
> > VM and that gives you reliable feedback of the status and which is
> > probably roughly as available as the hypervisor itself.
>
> Well, OK, that's was my understanding as well. I was curious I was missing
> something else ?
>
> Thank you for the details!
>
> Have a good day,
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250728/70664136/attachment.htm>

From jgdr at dalibo.com  Mon Jul 28 14:31:41 2025
From: jgdr at dalibo.com (Jehan-Guillaume de Rorthais)
Date: Mon, 28 Jul 2025 16:31:41 +0200
Subject: [ClusterLabs] Fencing agent fence_xvm using multicast
In-Reply-To: <CALrDAo1Lg0M8QqGj0=NT033Ekya5FteJCRsy4D+a1sWD89eEcg@mail.gmail.com>
References: <BY5PR06MB66426BAD4633B3C8C3473B8F944FA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CAPiuu99Txcf1_2j-AZ8NrigDvfGDQiWE0rQgHm5VGKjxXJYPTA@mail.gmail.com>
 <BY5PR06MB664260C8B4F105ADFC60D287945EA@BY5PR06MB6642.namprd06.prod.outlook.com>
 <CALrDAo1DCnORMM1Q-jEZ3r2Fud8_GOcWF7o4SZd7cPco5g7PDQ@mail.gmail.com>
 <20250728124636.53eb926c@karst>
 <CALrDAo1osMAQdsn=Ttc_i1MRBJJAgq0yXkN3b=Jch0wF3-LOqA@mail.gmail.com>
 <20250728144020.48ea2a86@karst>
 <CALrDAo1Lg0M8QqGj0=NT033Ekya5FteJCRsy4D+a1sWD89eEcg@mail.gmail.com>
Message-ID: <20250728163141.382b957c@karst>

On Mon, 28 Jul 2025 15:29:06 +0200
Klaus Wenninger <kwenning at redhat.com> wrote:

> On Mon, Jul 28, 2025 at 2:40?PM Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> wrote:
> 
> > On Mon, 28 Jul 2025 14:10:04 +0200
> > Klaus Wenninger <kwenning at redhat.com> wrote:
> >  
> > > On Mon, Jul 28, 2025 at 12:56?PM Jehan-Guillaume de Rorthais <  
> > > jgdr at dalibo.com> wrote:  
> > >  
> > > > [?]
> > * to have faster cluster reactions in some circumstances
> >  
> 
> In some circumstances is true ;-)

;-)

> In general the fencing side will have to wait because it might fall back to
> the device being taken down by the watchdog and that isn't any faster as with
> watchdog-fencing. 

Yes

> If the target is able to read the poison-pill it will probably reboot kind of
> instantaneously. But the fencing side will still have to wait.

OK, that's where I was kind of suspicious about my memories. Thanks.

> Probably not even the node coming back will speed up things as fencing
> will still be pending. But of course the time in between can be used for
> startup of the fenced node and it will be available to run services - if a
> reboot recovers it.

OK, not what I was thinking about, but a good take away. Thanks!

And thank you for your effort trying to find some bit of truth in my vaguely
formulated point :-)

Regards,

From tbean74 at gmail.com  Mon Jul 28 16:49:54 2025
From: tbean74 at gmail.com (Travis Bean)
Date: Mon, 28 Jul 2025 09:49:54 -0700
Subject: [ClusterLabs] how to active Clustered Logical Volume Manager with
 CRM
Message-ID: <CAFk47JhrAWf7+wX+7BUosfr-hDvNYAt+e_V7ab6sjzB=1_=REg@mail.gmail.com>

In the past, I always used the following to activate the Distributed
Lock Manager (DLM) and Clustered Logical Volume Manager (CLVM):

crm<<EOF.base
  configure
    property no-quorum-policy="ignore"
    rsc_defaults resource-stickiness="100"
    primitive controld ocf:pacemaker:controld \
      op monitor interval="60s" timeout="30s" \
      op start interval="0" timeout="90s" \
      op stop interval="0" timeout="100s"
    primitive clvmd ocf:lvm2:clvmd \
      op monitor interval="60s" timeout="30s" \
      op start interval="0" timeout="90s" \
      op stop interval="0" timeout="100s"
    group gr_base controld clvmd
    clone cl_base gr_base \
      meta interleave="true"
    commit
EOF.base

Now I get an error when attempting to use this code. The error is as follows:

ERROR: ocf:lvm2:clvmd: got no meta-data, does this RA exist?
ERROR: ocf:lvm2:clvmd: no such resource agent

Kind regards,

Travis Bean

From mpospisi at redhat.com  Mon Jul 28 17:07:00 2025
From: mpospisi at redhat.com (=?UTF-8?B?TWljaGFsIFBvc3DDrcWhaWw=?=)
Date: Mon, 28 Jul 2025 19:07:00 +0200
Subject: [ClusterLabs] PCS Web UI 0.1.23 released
Message-ID: <CAGd5o6T0VkYYcS-ZFSfLfNiNNLfP7corB5WgCneuEk4Hh3LsVw@mail.gmail.com>

I am happy to announce a new version of PCS Web UI - 0.1.23. This
version is special because it is now easier to compile and install
thanks to autotools. If you haven't heard about PCS Web UI, it provides
a web interface for managing clusters that uses PCS in the background.
It's been in active development since 2019 and is based on PatternFly
[1] which sets it apart from the old web interface that used to be
included in PCS. A picture speaks a thousand words, so feel free to
check out the screenshots on GitHub [2].

Source code is available at:
https://github.com/ClusterLabs/pcs-web-ui/archive/refs/tags/0.1.23.tar.gz
or
https://github.com/ClusterLabs/pcs-web-ui/archive/refs/tags/0.1.23.zip

The web UI can run in standalone mode (as a web page running on the
cluster node) but also as a Cockpit application. If Cockpit [3] doesn't
ring a bell, it is a web interface for managing servers. With PCS Web
UI, it is now also possible to manage the server and the cluster from
one place.

This version of PCS Web UI is compatible with pcs-0.11.10 and
pcs-0.12.1. PCS Web UI is developed in lockstep with PCS, therefore each
version is designed to work with a particular PCS version, compatibility
with other PCS versions is not guaranteed.

To install PCS Web UI, make sure to first install PCS according to the
instructions from its README [4]. If you would like to run the web
interface in standalone mode, make sure to include the `--enable-webui`
option when running `./configure` for PCS. After that, download the PCS
Web UI sources and follow the steps in its README [5].

Complete changelog for this release:
### Changed

* Small modal dialogs are placed on top of page in cockpit mode
  ([RHEL-30695])
* Layout of boxes inside cluster overview has been improved
  ([RHEL-30698])
* Resource status table has been simplified and it's behavior improved
  for lower browser resolutions ([RHEL-30671])

### Fixed

* Clones can be filtered by the agent of the cloned primitive
  ([RHEL-30693])
* Metadata for resource agents without provider are loaded correctly in
  the detail o primitive resource ([RHEL-79314])
* Delete action of a resource or a fence device can be forced
  ([RHEL-85195], [RHEL-84139])
* Disable action of SBD can be forced ([RHEL-84143])


Cheers,
Michal

[1] https://www.patternfly.org/
[2] https://github.com/ClusterLabs/pcs-web-ui/issues/81
[3] https://cockpit-project.org/
[4] https://github.com/ClusterLabs/pcs/blob/v0.12.1/README.md
[5] https://github.com/ClusterLabs/pcs-web-ui/blob/0.1.23/README.md
[RHEL-30671]: https://issues.redhat.com/browse/RHEL-30671
[RHEL-30693]: https://issues.redhat.com/browse/RHEL-30693
[RHEL-30695]: https://issues.redhat.com/browse/RHEL-30695
[RHEL-30698]: https://issues.redhat.com/browse/RHEL-30698
[RHEL-79314]: https://issues.redhat.com/browse/RHEL-79314
[RHEL-84139]: https://issues.redhat.com/browse/RHEL-84139
[RHEL-84143]: https://issues.redhat.com/browse/RHEL-84143
[RHEL-85195]: https://issues.redhat.com/browse/RHEL-85195


From tbean74 at gmail.com  Tue Jul 29 14:47:03 2025
From: tbean74 at gmail.com (Travis Bean)
Date: Tue, 29 Jul 2025 07:47:03 -0700
Subject: [ClusterLabs] how to active Clustered Logical Volume Manager
 with CRM
In-Reply-To: <CAFk47JhrAWf7+wX+7BUosfr-hDvNYAt+e_V7ab6sjzB=1_=REg@mail.gmail.com>
References: <CAFk47JhrAWf7+wX+7BUosfr-hDvNYAt+e_V7ab6sjzB=1_=REg@mail.gmail.com>
Message-ID: <CAFk47JhdfbtCdisrBjdON8zLw0GWdf7-M9EnBJ_cVSyoYgo7Cg@mail.gmail.com>

On Mon, Jul 28, 2025 at 9:49?AM Travis Bean <tbean74 at gmail.com> wrote:
>
> In the past, I always used the following to activate the Distributed
> Lock Manager (DLM) and Clustered Logical Volume Manager (CLVM):
>
> crm<<EOF.base
>   configure
>     property no-quorum-policy="ignore"
>     rsc_defaults resource-stickiness="100"
>     primitive controld ocf:pacemaker:controld \
>       op monitor interval="60s" timeout="30s" \
>       op start interval="0" timeout="90s" \
>       op stop interval="0" timeout="100s"
>     primitive clvmd ocf:lvm2:clvmd \
>       op monitor interval="60s" timeout="30s" \
>       op start interval="0" timeout="90s" \
>       op stop interval="0" timeout="100s"
>     group gr_base controld clvmd
>     clone cl_base gr_base \
>       meta interleave="true"
>     commit
> EOF.base
>
> Now I get an error when attempting to use this code. The error is as follows:
>
> ERROR: ocf:lvm2:clvmd: got no meta-data, does this RA exist?
> ERROR: ocf:lvm2:clvmd: no such resource agent

Is the following a suitable replacement for the above-mentioned
deprecated code, or am I missing something?:
crm<<EOF.base
  configure
    property no-quorum-policy="ignore"
    rsc_defaults resource-stickiness="100"
    primitive lvmlockd lvmlockd \
      op monitor interval="60s" timeout="30s" \
      op start interval="0" timeout="90s" \
      op stop interval="0" timeout="100s"
    primitive rsc-LVM LVM-activate \
      params vgname=vg_system vg_access_mode=lvmlockd \
      op monitor interval="60s" timeout="30s" \
      op start interval="0" timeout="90s" \
      op stop interval="0" timeout="100s"
    group gr_base lvmlockd rsc-LVM
    clone cl_base gr_base \
      meta interleave="true"
    commit
EOF.base

From tbean74 at gmail.com  Wed Jul 30 01:23:10 2025
From: tbean74 at gmail.com (Travis Bean)
Date: Tue, 29 Jul 2025 18:23:10 -0700
Subject: [ClusterLabs] how to active Clustered Logical Volume Manager
 with CRM
In-Reply-To: <CAFk47JhdfbtCdisrBjdON8zLw0GWdf7-M9EnBJ_cVSyoYgo7Cg@mail.gmail.com>
References: <CAFk47JhrAWf7+wX+7BUosfr-hDvNYAt+e_V7ab6sjzB=1_=REg@mail.gmail.com>
 <CAFk47JhdfbtCdisrBjdON8zLw0GWdf7-M9EnBJ_cVSyoYgo7Cg@mail.gmail.com>
Message-ID: <CAFk47JhUWXKcUow719JqU4m_z1cs=hRyTMTT-0XbOkLaxR2A-Q@mail.gmail.com>

On Tue, Jul 29, 2025 at 7:47?AM Travis Bean <tbean74 at gmail.com> wrote:
>
> On Mon, Jul 28, 2025 at 9:49?AM Travis Bean <tbean74 at gmail.com> wrote:
> >
> > In the past, I always used the following to activate the Distributed
> > Lock Manager (DLM) and Clustered Logical Volume Manager (CLVM):
> >
> > crm<<EOF.base
> >   configure
> >     property no-quorum-policy="ignore"
> >     rsc_defaults resource-stickiness="100"
> >     primitive controld ocf:pacemaker:controld \
> >       op monitor interval="60s" timeout="30s" \
> >       op start interval="0" timeout="90s" \
> >       op stop interval="0" timeout="100s"
> >     primitive clvmd ocf:lvm2:clvmd \
> >       op monitor interval="60s" timeout="30s" \
> >       op start interval="0" timeout="90s" \
> >       op stop interval="0" timeout="100s"
> >     group gr_base controld clvmd
> >     clone cl_base gr_base \
> >       meta interleave="true"
> >     commit
> > EOF.base
> >
> > Now I get an error when attempting to use this code. The error is as follows:
> >
> > ERROR: ocf:lvm2:clvmd: got no meta-data, does this RA exist?
> > ERROR: ocf:lvm2:clvmd: no such resource agent
>
> Is the following a suitable replacement for the above-mentioned
> deprecated code, or am I missing something?:
> crm<<EOF.base
>   configure
>     property no-quorum-policy="ignore"
>     rsc_defaults resource-stickiness="100"
>     primitive lvmlockd lvmlockd \
>       op monitor interval="60s" timeout="30s" \
>       op start interval="0" timeout="90s" \
>       op stop interval="0" timeout="100s"
>     primitive rsc-LVM LVM-activate \
>       params vgname=vg_system vg_access_mode=lvmlockd \
>       op monitor interval="60s" timeout="30s" \
>       op start interval="0" timeout="90s" \
>       op stop interval="0" timeout="100s"
>     group gr_base lvmlockd rsc-LVM
>     clone cl_base gr_base \
>       meta interleave="true"
>     commit
> EOF.base

The following code is what I was looking for. (I figured this out by
referencing the SUSE Linux Enterprise Server documentation, which has
the best examples using CRM):

# Activate Pacemaker's Distributed Lock Manager (DLM) and LVM locking
daemon (LVMLOCKD).
# DLM and LVMLOCKD must be activated together.
crm<<EOF.base
  configure
    property no-quorum-policy="ignore"
    rsc_defaults resource-stickiness="100"
    primitive controld ocf:pacemaker:controld \
      op monitor interval="60s" timeout="30s" \
      op start interval="0" timeout="90s" \
      op stop interval="0" timeout="100s"
    primitive lvmlockd lvmlockd \
      op monitor interval="60s" timeout="30s" \
      op start interval="0" timeout="90s" \
      op stop interval="0" timeout="100s"
    primitive LVM LVM-activate \
      params vgname=$(echo $LV_DRBD_PATH | cut -d "/" -f3)
vg_access_mode=lvmlockd activation_mode=shared \
      op monitor interval="60s" timeout="30s" \
      op start interval="0" timeout="90s" \
      op stop interval="0" timeout="100s"
    group gr_base controld lvmlockd LVM
    clone cl_base gr_base \
      meta interleave="true"
    commit
EOF.base

From andreas.mock at drumedar.de  Thu Jul 31 06:46:08 2025
From: andreas.mock at drumedar.de (Andreas Mock)
Date: Thu, 31 Jul 2025 06:46:08 +0000
Subject: [ClusterLabs] Creating pacemaker cluster starting with one node
Message-ID: <AS8PR06MB8412C17C1912694ABF8D8EFE8327A@AS8PR06MB8412.eurprd06.prod.outlook.com>

Hi all,

I have to upgrade an old pacemaker cluster consisting of two nodes. (There is no chance of rolling update).

Is it possible to create a pacemaker ?cluster? starting with only one node and afterwards attach the second node?

Best regards
Andreas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250731/41130276/attachment.htm>