[ClusterLabs] Problem with a new cluster with drbd on AlmaLinux 9

Testuser SST fatcharly at gmx.de
Tue Oct 22 12:18:44 UTC 2024


Hi,
I'm running a 2-node-web-cluster on Almalinux-9, pacemaker 2.1.7, drbd9 and corosync 3.1.
I have trouble with the promoting and mounting of the drbd-device. After activating the cluster,
the drbd-device is not getting mounted and is showing quite fast an error message:

pacemaker-schedulerd[4879]: warning: Unexpected result (error: Couldn't mount device [/dev/drbd1] as /mnt/clusterfs) was recorded for start of Webcontent_FS on ...
pacemaker-schedulerd[4879]: warning: Webcontent_FS cannot run on kathie3 due to reaching migration threshold (clean up resource to allow again)

It's like it's trying to mount the device, but the device is not ready yet.
The device is the drbd1 and I'm trying to mount it on /mnt/clusterfs. After the error occoured, and I do a "pcs resource cleanup" the cluster is able to mount it.
the drbd-resource is named webcontend_DRBD
the mounted filesystem is named webcontend_FS
All other resources like httpd and HA-IP's working like a charm.

This is the log from the start of the cluster:

Oct 22 11:48:12 kathie3 pacemaker-controld[4880]: notice: State transition S_ELECTION -> S_INTEGRATION
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      HA-IP_1               (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      HA-IP_2               (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      HA-IP_3               (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      Webcontent_DRBD:0     (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      Webcontent_FS         (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions: Start      ping_fw:0             (                        kathie3 )
Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Calculated transition 1106, saving inputs in /var/lib/pacemaker/pengine/pe-input-336.bz2
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation HA-IP_1_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation Webcontent_FS_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation ping_fw_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation Webcontent_DRBD_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for HA-IP_1 on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for ping_fw on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for Webcontent_DRBD on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for Webcontent_FS on kathie3
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682892]: INFO: Adding inet address 192.168.16.75/24 with broadcast address 192.168.16.255 to device ens3
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682912]: INFO: Bringing device ens3 up
Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1682923]: INFO: Running start for /dev/drbd1 on /mnt/clusterfs
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682929]: INFO: /usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p /run/resource-agents/send_arp-192.168.16.75 ens3 192.168.16.75 auto not_used not_used
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Starting worker thread (node-id 0)
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start operation for HA-IP_1 on kathie3: ok
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating monitor operation HA-IP_1_monitor_30000 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of monitor operation for HA-IP_1 on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating start operation HA-IP_2_start_0 locally on kathie3
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting local execution of start operation for HA-IP_2 on kathie3
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Auto-promote failed: Need access to UpToDate data (-2)
Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: meta-data IO uses: blk-bio
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Diskless -> Attaching ) [attach]
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Maximum number of peer devices = 1
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Method to ensure write ordering: flush
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: drbd_bm_resize called with capacity == 104854328
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: resync bitmap: bits=13106791 words=204794 pages=400
Oct 22 11:48:13 kathie3 kernel: drbd1: detected capacity change from 0 to 104854328
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: size = 50 GB (52427164 KB)
Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1683017]: ERROR: Couldn't mount device [/dev/drbd1] as /mnt/clusterfs
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start operation for Webcontent_FS on kathie3: error (Couldn't mount device [/dev/drbd1] as /mnt/clusterfs)
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Webcontent_FS_start_0 at kathie3 output [ blockdev: cannot open /dev/drbd1: No data available\nmount: /mnt/clusterfs: mount(2) system call failed: No data available.\nocf-exit-reason:Couldn't mount device [/dev/drbd1] as /mnt/clusterfs\n ]
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 aborted by operation Webcontent_FS_start_0 'modify' on kathie3: Event failed
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 action 37 (Webcontent_FS_start_0 on kathie3): expected 'ok' but got 'error'
Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting last-failure-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) -> 1729590493
Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting fail-count-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) -> INFINITY
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition 1106 aborted by status-1-last-failure-Webcontent_FS.start_0 doing create last-failure-Webcontent_FS#start_0=1729590493: Transient attribute change
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: bitmap READ of 400 pages took 34 ms
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk( Attaching -> UpToDate ) [attach]
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: attached to current UUID: 826E8850CF10C812
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Setting exposed data uuid: 826E8850CF10C812
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of monitor operation for HA-IP_1 on kathie3: ok
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting sender thread (peer-node-id 1)
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( StandAlone -> Unconnected ) [connect]
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting receiver thread (peer-node-id 1)
Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn( Unconnected -> Connecting ) [connecting]
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683100]: INFO: Adding inet address 192.168.16.76/24 with broadcast address 192.168.16.255 to device ens3
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683106]: INFO: Bringing device ens3 up
Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683112]: INFO: /usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p /run/resource-agents/send_arp-192.168.16.76 ens3 192.168.16.76 auto not_used not_used
Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of start operation for HA-IP_2 on kathie3: ok
Oct 22 11:48:15 kathie3 pacemaker-attrd[4878]: notice: Setting pingd[kathie3] in instance_attributes: (unset) -> 1000
Oct 22 11:48:15 kathie3 pacemaker-controld[4880]: notice: Result of start operation for ping_fw on kathie3: ok
Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_1)[1683126]: INFO: ARPING 192.168.16.75 from 192.168.16.75 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 response(s)
Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_2)[1683130]: INFO: ARPING 192.168.16.76 from 192.168.16.76 ens3#012Sent 5 probes (5 broadcast(s))#012Received 0 response(s)
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683138]: INFO: webcontent_data: Called drbdsetup wait-connect-resource webcontent_data --wfc-timeout=5 --degr-wfc-timeout=5 --outdated-wfc-timeout=5
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683142]: INFO: webcontent_data: Exit code 5
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683146]: INFO: webcontent_data: Command output:
Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683150]: INFO: webcontent_data: Command stderr:
Oct 22 11:48:19 kathie3 pacemaker-attrd[4878]: notice: Setting master-Webcontent_DRBD[kathie3] in instance_attributes: (unset) -> 1000
Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Result of start operation for Webcontent_DRBD on kathie3: ok
Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Initiating notify operation Webcontent_DRBD_post_notify_start_0 locally on kathie3
...

Is there some kind of timeout wrong or what am I missing ?

Any suggestions are welcome

Kind regards

fatcharly




More information about the Users mailing list