[ClusterLabs] DRBD + VDO HowTo?

Andrei Borzenkov arvidjaar at gmail.com
Tue May 18 04:55:54 EDT 2021


On Tue, May 18, 2021 at 8:20 AM Eric Robinson <eric.robinson at psmnv.com> wrote:
>
> Okay, here is a test, starting with the initial cluster status...
>
>
> [root at ha09a ~]# pcs status
> Cluster name: ha09ab
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition with quorum
>   * Last updated: Mon May 17 22:14:11 2021
>   * Last change:  Mon May 17 21:58:18 2021 by hacluster via crmd on ha09b
>   * 2 nodes configured
>   * 8 resource instances configured
>
> Node List:
>   * Online: [ ha09a ha09b ]
>
> Full List of Resources:
>   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
>     * Masters: [ ha09a ]
>     * Slaves: [ ha09b ]
>   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
>     * Masters: [ ha09a ]
>     * Slaves: [ ha09b ]
>   * p_vdo0      (lsb:vdo0):      Started ha09a
>   * p_vdo1      (lsb:vdo1):      Started ha09a
>   * p_fs_clust08        (ocf::heartbeat:Filesystem):     Started ha09a
>   * p_fs_clust09        (ocf::heartbeat:Filesystem):     Started ha09a
>
> Failed Resource Actions:
>   * p_vdo0_monitor_15000 on ha09a 'not running' (7): call=35, status='complete', exitreason='', last-rc-change='2021-05-17 21:01:28 -07:00', queued=0ms, exec=157ms
>   * p_vdo1_monitor_15000 on ha09a 'not running' (7): call=91, status='complete', exitreason='', last-rc-change='2021-05-17 21:56:57 -07:00', queued=0ms, exec=164ms
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
>
>
> Here are the constraints...
>
> [root at ha09a ~]# pcs constraint --full
> Location Constraints:
> Ordering Constraints:
>   promote p_drbd0-clone then start p_vdo0 (kind:Mandatory) (id:order-p_drbd0-clone-p_vdo0-mandatory)
>   promote p_drbd1-clone then start p_vdo1 (kind:Mandatory) (id:order-p_drbd1-clone-p_vdo1-mandatory)
>   start p_vdo0 then start p_fs_clust08 (kind:Mandatory) (id:order-p_vdo0-p_fs_clust08-mandatory)
>   start p_vdo1 then start p_fs_clust09 (kind:Mandatory) (id:order-p_vdo1-p_fs_clust09-mandatory)
> Colocation Constraints:
>   p_vdo0 with p_drbd0-clone (score:INFINITY) (id:colocation-p_vdo0-p_drbd0-clone-INFINITY)
>   p_vdo1 with p_drbd1-clone (score:INFINITY) (id:colocation-p_vdo1-p_drbd1-clone-INFINITY)

This is wrong. It says vdo can be active on every node where a clone
instance is active. You need colocation with master.

>   p_fs_clust08 with p_vdo0 (score:INFINITY) (id:colocation-p_fs_clust08-p_vdo0-INFINITY)
>   p_fs_clust09 with p_vdo1 (score:INFINITY) (id:colocation-p_fs_clust09-p_vdo1-INFINITY)
> Ticket Constraints:
>
> I will now try to move resource p_fs_clust08...
>
> [root at ha09a ~]# pcs resource move p_fs_clust08
> Warning: Creating location constraint 'cli-ban-p_fs_clust08-on-ha09a' with a score of -INFINITY for resource p_fs_clust08 on ha09a.
>         This will prevent p_fs_clust08 from running on ha09a until the constraint is removed
>         This will be the case even if ha09a is the last node in the cluster
> [root at ha09a ~]#
> [root at ha09a ~]#
>
> The resource fails to move and is now in a stopped state...
>
> [root at ha09a ~]# pcs status
> Cluster name: ha09ab
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition with quorum
>   * Last updated: Mon May 17 22:17:16 2021
>   * Last change:  Mon May 17 22:16:51 2021 by root via crm_resource on ha09a
>   * 2 nodes configured
>   * 8 resource instances configured
>
> Node List:
>   * Online: [ ha09a ha09b ]
>
> Full List of Resources:
>   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
>     * Masters: [ ha09b ]
>     * Slaves: [ ha09a ]
>   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
>     * Masters: [ ha09a ]
>     * Slaves: [ ha09b ]
>   * p_vdo0      (lsb:vdo0):      Started ha09b
>   * p_vdo1      (lsb:vdo1):      Started ha09a
>   * p_fs_clust08        (ocf::heartbeat:Filesystem):     Stopped
>   * p_fs_clust09        (ocf::heartbeat:Filesystem):     Started ha09a
>
> Failed Resource Actions:
>   * p_vdo0_monitor_15000 on ha09a 'not running' (7): call=35, status='complete', exitreason='', last-rc-change='2021-05-17 21:01:28 -07:00', queued=0ms, exec=157ms
>   * p_vdo1_monitor_15000 on ha09a 'not running' (7): call=91, status='complete', exitreason='', last-rc-change='2021-05-17 21:56:57 -07:00', queued=0ms, exec=164ms
>   * p_vdo0_monitor_15000 on ha09b 'not running' (7): call=35, status='complete', exitreason='', last-rc-change='2021-05-17 22:16:53 -07:00', queued=0ms, exec=170ms
>   * p_fs_clust08_start_0 on ha09b 'not installed' (5): call=36, status='complete', exitreason='Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist', last-rc-change='2021-05-17 22:16:53 -07:00', queued=0ms, exec=330ms
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
>
> Here are the logs from ha09a...
>
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: State transition S_IDLE -> S_POLICY_ENGINE
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice: On loss of quorum: Ignore
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo0 on ha09a at May 17 21:01:28 2021
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo1 on ha09a at May 17 21:56:57 2021
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice:  * Move       p_vdo0           (        ha09a -> ha09b )
> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice:  * Move       p_fs_clust08     (        ha09a -> ha09b )

As you see, pacemaker tries to move resources to node with secondary
DRBD instance instead of promoting DRBD first.

> May 17 22:16:51 ha09a pacemaker-schedulerd[2656]: notice: Calculated transition 24, saving inputs in /var/lib/pacemaker/pengine/pe-input-459.bz2
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: Initiating stop operation p_fs_clust08_stop_0 locally on ha09a
> May 17 22:16:51 ha09a Filesystem(p_fs_clust08)[50520]: INFO: Running stop for /dev/mapper/vdo0 on /ha01_mysql
> May 17 22:16:51 ha09a Filesystem(p_fs_clust08)[50520]: INFO: Trying to unmount /ha01_mysql
> May 17 22:16:51 ha09a systemd[1611]: ha01_mysql.mount: Succeeded.
> May 17 22:16:51 ha09a systemd[2582]: ha01_mysql.mount: Succeeded.
> May 17 22:16:51 ha09a systemd[1]: ha01_mysql.mount: Succeeded.
> May 17 22:16:51 ha09a kernel: XFS (dm-5): Unmounting Filesystem
> May 17 22:16:51 ha09a Filesystem(p_fs_clust08)[50520]: INFO: unmounted /ha01_mysql successfully
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: Result of stop operation for p_fs_clust08 on ha09a: ok
> May 17 22:16:51 ha09a pacemaker-controld[2657]: notice: Initiating stop operation p_vdo0_stop_0 locally on ha09a
> May 17 22:16:52 ha09a lvm[4241]: No longer monitoring VDO pool vdo0.
> May 17 22:16:52 ha09a UDS/vdodmeventd[50696]: INFO   (vdodmeventd/50696) VDO device vdo0 is now unregistered from dmeventd
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: suspending device 'vdo0'
> May 17 22:16:52 ha09a kernel: kvdo3:packerQ: compression is disabled
> May 17 22:16:52 ha09a kernel: kvdo3:packerQ: compression is enabled
> May 17 22:16:52 ha09a kernel: uds: dmsetup: beginning save (vcn 85)
> May 17 22:16:52 ha09a kernel: uds: dmsetup: finished save (vcn 85)
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: device 'vdo0' suspended
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: stopping device 'vdo0'
> May 17 22:16:52 ha09a kernel: kvdo3:dmsetup: device 'vdo0' stopped
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Result of stop operation for p_vdo0 on ha09a: ok
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating start operation p_vdo0_start_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating monitor operation p_vdo0_monitor_15000 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating start operation p_fs_clust08_start_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 aborted by operation p_vdo0_monitor_15000 'create' on ha09b: Event failed
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 action 69 (p_vdo0_monitor_15000 on ha09b): expected 'ok' but got 'not running'
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting fail-count-p_vdo0#monitor_15000[ha09b]: (unset) -> 1
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting last-failure-p_vdo0#monitor_15000[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 aborted by status-2-fail-count-p_vdo0.monitor_15000 doing create fail-count-p_vdo0#monitor_15000=1: Transient attribute change
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 action 73 (p_fs_clust08_start_0 on ha09b): expected 'ok' but got 'not installed'
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Transition 24 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-459.bz2): Complete
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting fail-count-p_fs_clust08#start_0[ha09b]: (unset) -> INFINITY
> May 17 22:16:53 ha09a pacemaker-attrd[2655]: notice: Setting last-failure-p_fs_clust08#start_0[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: On loss of quorum: Ignore
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo0 on ha09a at May 17 21:01:28 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo1 on ha09a at May 17 21:56:57 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo0 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing p_fs_clust08 from restarting on ha09b because of hard failure (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing p_fs_clust08 from restarting on ha09b because of hard failure (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Demote     p_drbd0:0        ( Master -> Slave ha09a )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Promote    p_drbd0:1        ( Slave -> Master ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Recover    p_vdo0           (                 ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Stop       p_fs_clust08     (                 ha09b )   due to node availability
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Calculated transition 25, saving inputs in /var/lib/pacemaker/pengine/pe-input-460.bz2
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: On loss of quorum: Ignore
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo0 on ha09a at May 17 21:01:28 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo1 on ha09a at May 17 21:56:57 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not running) was recorded for monitor of p_vdo0 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing p_fs_clust08 from restarting on ha09b because of hard failure (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Unexpected result (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist) was recorded for start of p_fs_clust08 on ha09b at May 17 22:16:53 2021
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Preventing p_fs_clust08 from restarting on ha09b because of hard failure (not installed: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: warning: Forcing p_fs_clust08 away from ha09b after 1000000 failures (max=1000000)
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Demote     p_drbd0:0        ( Master -> Slave ha09a )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Promote    p_drbd0:1        ( Slave -> Master ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Recover    p_vdo0           (                 ha09b )
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice:  * Stop       p_fs_clust08     (                 ha09b )   due to node availability
> May 17 22:16:53 ha09a pacemaker-schedulerd[2656]: notice: Calculated transition 26, saving inputs in /var/lib/pacemaker/pengine/pe-input-461.bz2
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating cancel operation p_drbd0_monitor_60000 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating stop operation p_fs_clust08_stop_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_pre_notify_demote_0 locally on ha09a
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_pre_notify_demote_0 on ha09b
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Result of notify operation for p_drbd0 on ha09a: ok
> May 17 22:16:53 ha09a pacemaker-controld[2657]: notice: Initiating stop operation p_vdo0_stop_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating demote operation p_drbd0_demote_0 locally on ha09a
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql: role( Primary -> Secondary )
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of demote operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_post_notify_demote_0 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_post_notify_demote_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of notify operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_pre_notify_promote_0 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_pre_notify_promote_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of notify operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating promote operation p_drbd0_promote_0 on ha09b
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql ha09b: Preparing remote state change 610633182
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql ha09b: Committing remote state change 610633182 (primary_nodes=1)
> May 17 22:16:54 ha09a kernel: drbd ha01_mysql ha09b: peer( Secondary -> Primary )
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_post_notify_promote_0 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating notify operation p_drbd0_post_notify_promote_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of notify operation for p_drbd0 on ha09a: ok
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating start operation p_vdo0_start_0 on ha09b
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Initiating monitor operation p_drbd0_monitor_60000 locally on ha09a
> May 17 22:16:54 ha09a pacemaker-controld[2657]: notice: Result of monitor operation for p_drbd0 on ha09a: ok
> May 17 22:16:56 ha09a pacemaker-controld[2657]: notice: Initiating monitor operation p_vdo0_monitor_15000 on ha09b
> May 17 22:16:57 ha09a pacemaker-controld[2657]: notice: Transition 26 (Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-461.bz2): Complete
> May 17 22:16:57 ha09a pacemaker-controld[2657]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>
> Here are the logs from ha09b...
>
> May 17 22:16:53 ha09b UDS/vdodumpconfig[3494]: ERROR  (vdodumpconfig/3494) openFile(): failed opening /dev/drbd0 with file access: 4: Wrong medium type (124)
> May 17 22:16:53 ha09b vdo[3486]: ERROR - vdodumpconfig: Failed to make FileLayer from '/dev/drbd0' with Wrong medium type
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of start operation for p_vdo0 on ha09b: ok
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3496]: INFO: Running start for /dev/mapper/vdo0 on /ha01_mysql
> May 17 22:16:53 ha09b UDS/vdodumpconfig[3577]: ERROR  (vdodumpconfig/3577) openFile(): failed opening /dev/drbd0 with file access: 4: Wrong medium type (124)
> May 17 22:16:53 ha09b vdo[3503]: ERROR - vdodumpconfig: Failed to make FileLayer from '/dev/drbd0' with Wrong medium type
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of monitor operation for p_vdo0 on ha09b: not running
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: ha09b-p_vdo0_monitor_15000:35 [ error occurred checking vdo0 status on ha09b\n ]
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting fail-count-p_vdo0#monitor_15000[ha09b]: (unset) -> 1
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting last-failure-p_vdo0#monitor_15000[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3496]: ERROR: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist
> May 17 22:16:53 ha09b pacemaker-execd[2707]: notice: p_fs_clust08_start_0[3496] error output [ ocf-exit-reason:Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist ]
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of start operation for p_fs_clust08 on ha09b: not installed
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: ha09b-p_fs_clust08_start_0:36 [ ocf-exit-reason:Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist\n ]
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting fail-count-p_fs_clust08#start_0[ha09b]: (unset) -> INFINITY
> May 17 22:16:53 ha09b pacemaker-attrd[2708]: notice: Setting last-failure-p_fs_clust08#start_0[ha09b]: (unset) -> 1621315013
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3609]: WARNING: Couldn't find device [/dev/mapper/vdo0]. Expected /dev/??? to exist
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of notify operation for p_drbd0 on ha09b: ok
> May 17 22:16:53 ha09b Filesystem(p_fs_clust08)[3609]: INFO: Running stop for /dev/mapper/vdo0 on /ha01_mysql
> May 17 22:16:53 ha09b pacemaker-execd[2707]: notice: p_fs_clust08_stop_0[3609] error output [ blockdev: cannot open /dev/mapper/vdo0: No such file or directory ]
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: Result of stop operation for p_fs_clust08 on ha09b: ok
> May 17 22:16:53 ha09b pacemaker-controld[2710]: notice: ha09b-p_vdo0_monitor_15000:35 [ error occurred checking vdo0 status on ha09b\n ]
> May 17 22:16:54 ha09b UDS/vdodumpconfig[3705]: ERROR  (vdodumpconfig/3705) openFile(): failed opening /dev/drbd0 with file access: 4: Wrong medium type (124)
> May 17 22:16:54 ha09b vdo[3697]: ERROR - vdodumpconfig: Failed to make FileLayer from '/dev/drbd0' with Wrong medium type
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of stop operation for p_vdo0 on ha09b: ok
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql ha09a: peer( Primary -> Secondary )
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of notify operation for p_drbd0 on ha09b: ok
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of notify operation for p_drbd0 on ha09b: ok
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: Preparing cluster-wide state change 610633182 (0->-1 3/1)
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: State change 610633182: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFFC
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: Committing cluster-wide state change 610633182 (1ms)
> May 17 22:16:54 ha09b kernel: drbd ha01_mysql: role( Secondary -> Primary )
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of promote operation for p_drbd0 on ha09b: ok
> May 17 22:16:54 ha09b pacemaker-controld[2710]: notice: Result of notify operation for p_drbd0 on ha09b: ok
> May 17 22:16:55 ha09b kernel: uds: modprobe: loaded version 8.0.1.6
> May 17 22:16:55 ha09b kernel: kvdo: modprobe: loaded version 6.2.3.114
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: underlying device, REQ_FLUSH: supported, REQ_FUA: supported
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: Using write policy async automatically.
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: loading device 'vdo0'
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: zones: 1 logical, 1 physical, 1 hash; base threads: 5
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: starting device 'vdo0'
> May 17 22:16:55 ha09b kernel: kvdo0:journalQ: VDO commencing normal operation
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: Setting UDS index target state to online
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: device 'vdo0' started
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: resuming device 'vdo0'
> May 17 22:16:55 ha09b kernel: kvdo0:dmsetup: device 'vdo0' resumed
> May 17 22:16:55 ha09b kernel: uds: kvdo0:dedupeQ: loading or rebuilding index: dev=/dev/drbd0 offset=4096 size=2781704192
> May 17 22:16:55 ha09b kernel: uds: kvdo0:dedupeQ: Using 6 indexing zones for concurrency.
> May 17 22:16:55 ha09b kernel: kvdo0:packerQ: compression is enabled
> May 17 22:16:55 ha09b systemd[1]: Started Device-mapper event daemon.
> May 17 22:16:55 ha09b dmeventd[3931]: dmeventd ready for processing.
> May 17 22:16:55 ha09b UDS/vdodmeventd[3930]: INFO   (vdodmeventd/3930) VDO device vdo0 is now registered with dmeventd for monitoring
> May 17 22:16:55 ha09b lvm[3931]: Monitoring VDO pool vdo0.
> May 17 22:16:56 ha09b kernel: uds: kvdo0:dedupeQ: loaded index from chapter 0 through chapter 85
> May 17 22:16:56 ha09b pacemaker-controld[2710]: notice: Result of start operation for p_vdo0 on ha09b: ok
> May 17 22:16:57 ha09b pacemaker-controld[2710]: notice: Result of monitor operation for p_vdo0 on ha09b: ok
>
>
>
> > -----Original Message-----
> > From: Users <users-bounces at clusterlabs.org> On Behalf Of Eric Robinson
> > Sent: Monday, May 17, 2021 9:49 PM
> > To: Cluster Labs - All topics related to open-source clustering welcomed
> > <users at clusterlabs.org>
> > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> >
> > Notice in that 'pcs status' shows errors for resource p_vdo0 on node ha09b,
> > even after doing 'pcs resource cleanup p_vdo0'.
> >
> > [root at ha09a ~]# pcs status
> > Cluster name: ha09ab
> > Cluster Summary:
> >   * Stack: corosync
> >   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition with
> > quorum
> >   * Last updated: Mon May 17 19:45:41 2021
> >   * Last change:  Mon May 17 19:45:37 2021 by hacluster via crmd on ha09b
> >   * 2 nodes configured
> >   * 6 resource instances configured
> >
> > Node List:
> >   * Online: [ ha09a ha09b ]
> >
> > Full List of Resources:
> >   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
> >     * Masters: [ ha09a ]
> >     * Slaves: [ ha09b ]
> >   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
> >     * Masters: [ ha09b ]
> >     * Slaves: [ ha09a ]
> >   * p_vdo0      (lsb:vdo0):      Starting ha09a
> >   * p_vdo1      (lsb:vdo1):      Started ha09b
> >
> > Failed Resource Actions:
> >   * p_vdo0_monitor_0 on ha09b 'error' (1): call=83, status='complete',
> > exitreason='', last-rc-change='2021-05-17 19:45:38 -07:00', queued=0ms,
> > exec=175ms
> >
> > Daemon Status:
> >   corosync: active/disabled
> >   pacemaker: active/disabled
> >   pcsd: active/enabled
> >
> >
> > If I debug the monitor action on ha09b, it reports 'not installed,' which makes
> > sense because the drbd disk is in standby.
> >
> > [root at ha09b drbd.d]# pcs resource debug-monitor p_vdo0 Operation
> > monitor for p_vdo0 (lsb::vdo0) returned: 'not installed' (5)  >  stdout: error
> > occurred checking vdo0 status on ha09b
> >
> > Should it report something else?
> >
> > > -----Original Message-----
> > > From: Users <users-bounces at clusterlabs.org> On Behalf Of Eric Robinson
> > > Sent: Monday, May 17, 2021 1:37 PM
> > > To: Cluster Labs - All topics related to open-source clustering
> > > welcomed <users at clusterlabs.org>
> > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > >
> > > Andrei --
> > >
> > > To follow up, here is the Pacemaker config. Let's not talk about
> > > fencing or quorum right now. I want to focus on the vdo issue at hand.
> > >
> > > [root at ha09a ~]# pcs config
> > > Cluster Name: ha09ab
> > > Corosync Nodes:
> > >  ha09a ha09b
> > > Pacemaker Nodes:
> > >  ha09a ha09b
> > >
> > > Resources:
> > >  Clone: p_drbd0-clone
> > >   Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true
> > > promoted-max=1 promoted-node-max=1
> > >   Resource: p_drbd0 (class=ocf provider=linbit type=drbd)
> > >    Attributes: drbd_resource=ha01_mysql
> > >    Operations: demote interval=0s timeout=90 (p_drbd0-demote-interval-
> > 0s)
> > >                monitor interval=60s (p_drbd0-monitor-interval-60s)
> > >                notify interval=0s timeout=90 (p_drbd0-notify-interval-0s)
> > >                promote interval=0s timeout=90 (p_drbd0-promote-interval-0s)
> > >                reload interval=0s timeout=30 (p_drbd0-reload-interval-0s)
> > >                start interval=0s timeout=240 (p_drbd0-start-interval-0s)
> > >                stop interval=0s timeout=100 (p_drbd0-stop-interval-0s)
> > >  Clone: p_drbd1-clone
> > >   Meta Attrs: clone-max=2 clone-node-max=1 notify=true promotable=true
> > > promoted-max=1 promoted-node-max=1
> > >   Resource: p_drbd1 (class=ocf provider=linbit type=drbd)
> > >    Attributes: drbd_resource=ha02_mysql
> > >    Operations: demote interval=0s timeout=90 (p_drbd1-demote-interval-
> > 0s)
> > >                monitor interval=60s (p_drbd1-monitor-interval-60s)
> > >                notify interval=0s timeout=90 (p_drbd1-notify-interval-0s)
> > >                promote interval=0s timeout=90 (p_drbd1-promote-interval-0s)
> > >                reload interval=0s timeout=30 (p_drbd1-reload-interval-0s)
> > >                start interval=0s timeout=240 (p_drbd1-start-interval-0s)
> > >                stop interval=0s timeout=100 (p_drbd1-stop-interval-0s)
> > >  Resource: p_vdo0 (class=lsb type=vdo0)
> > >   Operations: force-reload interval=0s timeout=15
> > > (p_vdo0-force-reload-
> > > interval-0s)
> > >               monitor interval=15 timeout=15 (p_vdo0-monitor-interval-15)
> > >               restart interval=0s timeout=15 (p_vdo0-restart-interval-0s)
> > >               start interval=0s timeout=15 (p_vdo0-start-interval-0s)
> > >               stop interval=0s timeout=15 (p_vdo0-stop-interval-0s)
> > >  Resource: p_vdo1 (class=lsb type=vdo1)
> > >   Operations: force-reload interval=0s timeout=15
> > > (p_vdo1-force-reload-
> > > interval-0s)
> > >               monitor interval=15 timeout=15 (p_vdo1-monitor-interval-15)
> > >               restart interval=0s timeout=15 (p_vdo1-restart-interval-0s)
> > >               start interval=0s timeout=15 (p_vdo1-start-interval-0s)
> > >               stop interval=0s timeout=15 (p_vdo1-stop-interval-0s)
> > >
> > > Stonith Devices:
> > > Fencing Levels:
> > >
> > > Location Constraints:
> > > Ordering Constraints:
> > >   promote p_drbd0-clone then start p_vdo0 (kind:Mandatory) (id:order-
> > > p_drbd0-clone-p_vdo0-mandatory)
> > >   promote p_drbd1-clone then start p_vdo1 (kind:Mandatory) (id:order-
> > > p_drbd1-clone-p_vdo1-mandatory)
> > > Colocation Constraints:
> > >   p_vdo0 with p_drbd0-clone (score:INFINITY) (id:colocation-p_vdo0-
> > > p_drbd0-clone-INFINITY)
> > >   p_vdo1 with p_drbd1-clone (score:INFINITY) (id:colocation-p_vdo1-
> > > p_drbd1-clone-INFINITY)
> > > Ticket Constraints:
> > >
> > > Alerts:
> > >  No alerts defined
> > >
> > > Resources Defaults:
> > >   Meta Attrs: rsc_defaults-meta_attributes
> > >     resource-stickiness=100
> > > Operations Defaults:
> > >   Meta Attrs: op_defaults-meta_attributes
> > >     timeout=30s
> > >
> > > Cluster Properties:
> > >  cluster-infrastructure: corosync
> > >  cluster-name: ha09ab
> > >  dc-version: 2.0.4-6.el8_3.2-2deceaa3ae
> > >  have-watchdog: false
> > >  last-lrm-refresh: 1621198059
> > >  maintenance-mode: false
> > >  no-quorum-policy: ignore
> > >  stonith-enabled: false
> > >
> > > Tags:
> > >  No tags defined
> > >
> > > Quorum:
> > >   Options:
> > >
> > > Here is the cluster status. Right now, node ha09a is primary for both
> > > drbd disks.
> > >
> > > [root at ha09a ~]# pcs status
> > > Cluster name: ha09ab
> > > Cluster Summary:
> > >   * Stack: corosync
> > >   * Current DC: ha09a (version 2.0.4-6.el8_3.2-2deceaa3ae) - partition
> > > with quorum
> > >   * Last updated: Mon May 17 11:35:34 2021
> > >   * Last change:  Mon May 17 11:34:24 2021 by hacluster via crmd on ha09a
> > >   * 2 nodes configured
> > >   * 6 resource instances configured (2 BLOCKED from further action due
> > > to
> > > failure)
> > >
> > > Node List:
> > >   * Online: [ ha09a ha09b ]
> > >
> > > Full List of Resources:
> > >   * Clone Set: p_drbd0-clone [p_drbd0] (promotable):
> > >     * Masters: [ ha09a ]
> > >     * Slaves: [ ha09b ]
> > >   * Clone Set: p_drbd1-clone [p_drbd1] (promotable):
> > >     * Masters: [ ha09a ]
> > >     * Slaves: [ ha09b ]
> > >   * p_vdo0      (lsb:vdo0):      FAILED ha09a (blocked)
> > >   * p_vdo1      (lsb:vdo1):      FAILED ha09a (blocked)
> > >
> > > Failed Resource Actions:
> > >   * p_vdo1_stop_0 on ha09a 'error' (1): call=21, status='Timed Out',
> > > exitreason='', last-rc-change='2021-05-17 11:29:09 -07:00',
> > > queued=0ms, exec=15001ms
> > >   * p_vdo0_stop_0 on ha09a 'error' (1): call=27, status='Timed Out',
> > > exitreason='', last-rc-change='2021-05-17 11:34:26 -07:00',
> > > queued=0ms, exec=15001ms
> > >   * p_vdo1_monitor_0 on ha09b 'error' (1): call=21, status='complete',
> > > exitreason='', last-rc-change='2021-05-17 11:29:08 -07:00',
> > > queued=0ms, exec=217ms
> > >   * p_vdo0_monitor_0 on ha09b 'error' (1): call=28, status='complete',
> > > exitreason='', last-rc-change='2021-05-17 11:34:25 -07:00',
> > > queued=0ms, exec=182ms
> > >
> > > Daemon Status:
> > >   corosync: active/disabled
> > >   pacemaker: active/disabled
> > >   pcsd: active/enabled
> > >
> > > The vdo devices are available...
> > >
> > > [root at ha09a ~]# vdo list
> > > vdo0
> > > vdo1
> > >
> > >
> > > > -----Original Message-----
> > > > From: Users <users-bounces at clusterlabs.org> On Behalf Of Eric
> > > > Robinson
> > > > Sent: Monday, May 17, 2021 1:28 PM
> > > > To: Cluster Labs - All topics related to open-source clustering
> > > > welcomed <users at clusterlabs.org>
> > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > >
> > > > Andrei --
> > > >
> > > > Sorry for the novels. Sometimes it is hard to tell whether people
> > > > want all the configs, logs, and scripts first, or if they want a
> > > > description of the problem and what one is trying to accomplish first.
> > > > I'll send whatever you want. I am very eager to get to the bottom of this.
> > > >
> > > > I'll start with my custom LSB RA. I can send the Pacemaker config a bit
> > later.
> > > >
> > > > [root at ha09a init.d]# ll|grep vdo
> > > > lrwxrwxrwx. 1 root root     9 May 16 10:28 vdo0 -> vdo_multi
> > > > lrwxrwxrwx. 1 root root     9 May 16 10:28 vdo1 -> vdo_multi
> > > > -rwx------. 1 root root  3623 May 16 13:21 vdo_multi
> > > >
> > > > [root at ha09a init.d]#  cat vdo_multi
> > > > #!/bin/bash
> > > >
> > > > #--custom script for managing vdo volumes
> > > >
> > > > #--functions
> > > > function isActivated() {
> > > >         R=$(/usr/bin/vdo status -n $VOL 2>&1)
> > > >         if [ $? -ne 0 ]; then
> > > >                 #--error occurred checking vdo status
> > > >                 echo "$VOL: an error occurred checking activation
> > > > status on $MY_HOSTNAME"
> > > >                 return 1
> > > >         fi
> > > >         R=$(/usr/bin/vdo status -n $VOL|grep Activate|awk '{$1=$1};1'|cut -
> > d"
> > > "
> > > > -f2)
> > > >         echo "$R"
> > > >         return 0
> > > > }
> > > >
> > > > function isOnline() {
> > > >         R=$(/usr/bin/vdo status -n $VOL 2>&1)
> > > >         if [ $? -ne 0 ]; then
> > > >                 #--error occurred checking vdo status
> > > >                 echo "$VOL: an error occurred checking activation
> > > > status on $MY_HOSTNAME"
> > > >                 return 1
> > > >         fi
> > > >         R=$(/usr/bin/vdo status -n $VOL|grep "Index status"|awk
> > > > '{$1=$1};1'|cut -d" " -f3)
> > > >         echo "$R"
> > > >         return 0
> > > > }
> > > >
> > > > #--vars
> > > > MY_HOSTNAME=$(hostname -s)
> > > >
> > > > #--get the volume name
> > > > VOL=$(basename $0)
> > > >
> > > > #--get the action
> > > > ACTION=$1
> > > >
> > > > #--take the requested action
> > > > case $ACTION in
> > > >
> > > >         start)
> > > >
> > > >                 #--check current status
> > > >                 R=$(isOnline "$VOL")
> > > >                 if [ $? -ne 0 ]; then
> > > >                         echo "error occurred checking $VOL status on
> > > $MY_HOSTNAME"
> > > >                         exit 0
> > > >                 fi
> > > >                 if [ "$R"  == "online" ]; then
> > > >                         echo "running on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 fi
> > > >
> > > >                 #--enter activation loop
> > > >                 ACTIVATED=no
> > > >                 TIMER=15
> > > >                 while [ $TIMER -ge 0 ]; do
> > > >                         R=$(isActivated "$VOL")
> > > >                         if [ "$R" == "enabled" ]; then
> > > >                                 ACTIVATED=yes
> > > >                                 break
> > > >                         fi
> > > >                         sleep 1
> > > >                         TIMER=$(( TIMER-1 ))
> > > >                 done
> > > >                 if [ "$ACTIVATED" == "no" ]; then
> > > >                         echo "$VOL: not activated on $MY_HOSTNAME"
> > > >                         exit 5 #--lsb: not running
> > > >                 fi
> > > >
> > > >                 #--enter start loop
> > > >                 /usr/bin/vdo start -n $VOL
> > > >                 ONLINE=no
> > > >                 TIMER=15
> > > >                 while [ $TIMER -ge 0 ]; do
> > > >                         R=$(isOnline "$VOL")
> > > >                         if [ "$R" == "online" ]; then
> > > >                                 ONLINE=yes
> > > >                                 break
> > > >                         fi
> > > >                         sleep 1
> > > >                         TIMER=$(( TIMER-1 ))
> > > >                 done
> > > >                 if [ "$ONLINE" == "yes" ]; then
> > > >                         echo "$VOL: started on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 else
> > > >                         echo "$VOL: not started on $MY_HOSTNAME
> > > > (unknown problem)"
> > > >                         exit 0 #--lsb: unknown problem
> > > >                 fi
> > > >                 ;;
> > > >         stop)
> > > >
> > > >                 #--check current status
> > > >                 R=$(isOnline "$VOL")
> > > >                 if [ $? -ne 0 ]; then
> > > >                         echo "error occurred checking $VOL status on
> > > $MY_HOSTNAME"
> > > >                         exit 0
> > > >                 fi
> > > >
> > > >                 if [ "$R" == "not" ]; then
> > > >                         echo "not started on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 fi
> > > >
> > > >                 #--enter stop loop
> > > >                 /usr/bin/vdo stop -n $VOL
> > > >                 ONLINE=yes
> > > >                 TIMER=15
> > > >                 while [ $TIMER -ge 0 ]; do
> > > >                         R=$(isOnline "$VOL")
> > > >                         if [ "$R" == "not" ]; then
> > > >                                 ONLINE=no
> > > >                                 break
> > > >                         fi
> > > >                         sleep 1
> > > >                         TIMER=$(( TIMER-1 ))
> > > >                 done
> > > >                 if [ "$ONLINE" == "no" ]; then
> > > >                         echo "$VOL: stopped on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb:success
> > > >                 else
> > > >                         echo "$VOL: failed to stop on $MY_HOSTNAME
> > > > (unknown problem)"
> > > >                         exit 0
> > > >                 fi
> > > >                 ;;
> > > >         status)
> > > >                 R=$(isOnline "$VOL")
> > > >                 if [ $? -ne 0 ]; then
> > > >                         echo "error occurred checking $VOL status on
> > > $MY_HOSTNAME"
> > > >                         exit 5
> > > >                 fi
> > > >                 if [ "$R"  == "online" ]; then
> > > >                         echo "$VOL started on $MY_HOSTNAME"
> > > >                         exit 0 #--lsb: success
> > > >                 else
> > > >                         echo "$VOL not started on $MY_HOSTNAME"
> > > >                         exit 3 #--lsb: not running
> > > >                 fi
> > > >                 ;;
> > > >
> > > > esac
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Users <users-bounces at clusterlabs.org> On Behalf Of Andrei
> > > > > Borzenkov
> > > > > Sent: Monday, May 17, 2021 12:49 PM
> > > > > To: users at clusterlabs.org
> > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > >
> > > > > On 17.05.2021 18:18, Eric Robinson wrote:
> > > > > > To Strahil and Klaus –
> > > > > >
> > > > > > I created the vdo devices using default parameters, so ‘auto’
> > > > > > mode was
> > > > > selected by default. vdostatus shows that the current mode is async.
> > > > > The underlying drbd devices are running protocol C, so I assume
> > > > > that vdo should be changed to sync mode?
> > > > > >
> > > > > > The VDO service is disabled and is solely under the control of
> > > > > > Pacemaker,
> > > > > but I have been unable to get a resource agent to work reliably. I
> > > > > have two nodes. Under normal operation, Node A is primary for disk
> > > > > drbd0, and device
> > > > > vdo0 rides on top of that. Node B is primary for disk drbd1 and
> > > > > device
> > > > > vdo1 rides on top of that. In the event of a node failure, the vdo
> > > > > device and the underlying drbd disk should migrate to the other
> > > > > node, and then that node will be primary for both drbd disks and
> > > > > both vdo
> > > > devices.
> > > > > >
> > > > > > The default systemd vdo service does not work because it uses
> > > > > > the –all flag
> > > > > and starts/stops all vdo devices. I noticed that there is also a
> > > > > vdo-start-by- dev.service, but there is no documentation on how to
> > > > > use it. I wrote my own vdo-by-dev system service, but that did not
> > > > > work reliably either. Then I noticed that there is already an OCF
> > > > > resource agent named vdo-vol, but that did not work either. I
> > > > > finally tried writing my own OCF-compliant RA, and then I tried
> > > > > writing an LSB-compliant script, but none of those worked very well.
> > > > > >
> > > > >
> > > > > You continue to write novels instead of simply showing your
> > > > > resource agent, your configuration and logs.
> > > > >
> > > > > > My big problem is that I don’t understand how Pacemaker uses the
> > > > > monitor action. Pacemaker would often fail vdo resources because
> > > > > the monitor action received an error when it ran on the standby node.
> > > > > For example, when Node A is primary for disk drbd1 and device
> > > > > vdo1, Pacemaker would fail device vdo1 because when it ran the
> > > > > monitor action on Node B, the RA reported an error. But OF COURSE
> > > > > it would report an error, because disk drbd1 is secondary on that
> > > > > node, and is therefore inaccessible to the vdo driver. I DON’T
> > UNDERSTAND.
> > > > > >
> > > > >
> > > > > May be your definition of "error" does not match pacemaker
> > > > > definition of "error". It is hard to comment without seeing code.
> > > > >
> > > > > > -Eric
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov <hunter86_bg at yahoo.com>
> > > > > > Sent: Monday, May 17, 2021 5:09 AM
> > > > > > To: kwenning at redhat.com; Klaus Wenninger
> > > <kwenning at redhat.com>;
> > > > > > Cluster Labs - All topics related to open-source clustering
> > > > > > welcomed <users at clusterlabs.org>; Eric Robinson
> > > > > > <eric.robinson at psmnv.com>
> > > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > > Have you tried to set VDO in async mode ?
> > > > > >
> > > > > > Best Regards,
> > > > > > Strahil Nikolov
> > > > > > On Mon, May 17, 2021 at 8:57, Klaus Wenninger
> > > > > > <kwenning at redhat.com<mailto:kwenning at redhat.com>> wrote:
> > > > > > Did you try VDO in sync-mode for the case the flush-fua stuff
> > > > > > isn't working through the layers?
> > > > > > Did you check that VDO-service is disabled and solely under
> > > > > > pacemaker-control and that the dependencies are set correctly?
> > > > > >
> > > > > > Klaus
> > > > > >
> > > > > > On 5/17/21 6:17 AM, Eric Robinson wrote:
> > > > > >
> > > > > > Yes, DRBD is working fine.
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov
> > > > > > <hunter86_bg at yahoo.com><mailto:hunter86_bg at yahoo.com>
> > > > > > Sent: Sunday, May 16, 2021 6:06 PM
> > > > > > To: Eric Robinson
> > > > > > <eric.robinson at psmnv.com><mailto:eric.robinson at psmnv.com>;
> > > > Cluster
> > > > > > Labs - All topics related to open-source clustering welcomed
> > > > > > <users at clusterlabs.org><mailto:users at clusterlabs.org>
> > > > > > Subject: RE: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Are you sure that the DRBD is working properly ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Strahil Nikolov
> > > > > >
> > > > > > On Mon, May 17, 2021 at 0:32, Eric Robinson
> > > > > >
> > > > > > <eric.robinson at psmnv.com<mailto:eric.robinson at psmnv.com>>
> > > wrote:
> > > > > >
> > > > > > Okay, it turns out I was wrong. I thought I had it working, but
> > > > > > I keep running
> > > > > into problems. Sometimes when I demote a DRBD resource on Node A
> > > and
> > > > > promote it on Node B, and I try to mount the filesystem, the
> > > > > system complains that it cannot read the superblock. But when I
> > > > > move the DRBD primary back to Node A, the file system is mountable
> > again.
> > > > > Also, I have problems with filesystems not mounting because the
> > > > > vdo devices are not present. All kinds of issues.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Users
> > > > > > <users-bounces at clusterlabs.org<mailto:users-
> > > > bounces at clusterlabs.org>
> > > > > > >
> > > > > > On Behalf Of Eric Robinson
> > > > > > Sent: Friday, May 14, 2021 3:55 PM
> > > > > > To: Strahil Nikolov
> > > > > > <hunter86_bg at yahoo.com<mailto:hunter86_bg at yahoo.com>>;
> > > Cluster
> > > > > Labs -
> > > > > > All topics related to open-source clustering welcomed
> > > > > > <users at clusterlabs.org<mailto:users at clusterlabs.org>>
> > > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Okay, I have it working now. The default systemd service
> > > > > > definitions did
> > > > > not work, so I created my own.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov
> > > > > > <hunter86_bg at yahoo.com<mailto:hunter86_bg at yahoo.com>>
> > > > > > Sent: Friday, May 14, 2021 3:41 AM
> > > > > > To: Eric Robinson
> > > > > > <eric.robinson at psmnv.com<mailto:eric.robinson at psmnv.com>>;
> > > > Cluster
> > > > > > Labs - All topics related to open-source clustering welcomed
> > > > > > <users at clusterlabs.org<mailto:users at clusterlabs.org>>
> > > > > > Subject: RE: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > > There is no VDO RA according to my knowledge, but you can use
> > > > > > systemd
> > > > > service as a resource.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Yet, the VDO service that comes with thr OS is a generic one and
> > > > > > controlls
> > > > > all VDOs - so you need to create your own vdo service.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Strahil Nikolov
> > > > > >
> > > > > > On Fri, May 14, 2021 at 6:55, Eric Robinson
> > > > > >
> > > > > > <eric.robinson at psmnv.com<mailto:eric.robinson at psmnv.com>>
> > > wrote:
> > > > > >
> > > > > > I created the VDO volumes fine on the drbd devices, formatted
> > > > > > them as xfs
> > > > > filesystems, created cluster filesystem resources, and the cluster
> > > > > us using them. But the cluster won’t fail over. Is there a VDO
> > > > > cluster RA out there somewhere already?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > From: Strahil Nikolov
> > > > > > <hunter86_bg at yahoo.com<mailto:hunter86_bg at yahoo.com>>
> > > > > > Sent: Thursday, May 13, 2021 10:07 PM
> > > > > > To: Cluster Labs - All topics related to open-source clustering
> > > > > > welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>;
> > > > > > Eric Robinson
> > > > > <eric.robinson at psmnv.com<mailto:eric.robinson at psmnv.com>>
> > > > > > Subject: Re: [ClusterLabs] DRBD + VDO HowTo?
> > > > > >
> > > > > >
> > > > > >
> > > > > > For DRBD there is enough info, so let's focus on VDO.
> > > > > >
> > > > > > There is a systemd service that starts all VDOs on the system.
> > > > > > You can
> > > > > create the VDO once drbs is open for writes and then you can
> > > > > create your own systemd '.service' file which can be used as a cluster
> > resource.
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Strahil Nikolov
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, May 14, 2021 at 2:33, Eric Robinson
> > > > > >
> > > > > > <eric.robinson at psmnv.com<mailto:eric.robinson at psmnv.com>>
> > > wrote:
> > > > > >
> > > > > > Can anyone point to a document on how to use VDO de-duplication
> > > > > > with
> > > > > DRBD? Linbit has a blog page about it, but it was last updated 6
> > > > > years ago and the embedded links are dead.
> > > > > >
> > > > > >
> > > > > >
> > > > > > https://linbit.com/blog/albireo-virtual-data-optimizer-vdo-on-dr
> > > > > > bd
> > > > > > /
> > > > > >
> > > > > >
> > > > > >
> > > > > > -Eric
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > _______________________________________________
> > > > > > Manage your subscription:
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > >
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > > _______________________________________________
> > > > > >
> > > > > > Manage your subscription:
> > > > > >
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > >
> > > > > >
> > > > > >
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > >
> > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > confidential and
> > > > > intended solely for intended recipients. If you are not the named
> > > > > addressee you should not disseminate, distribute, copy or alter
> > > > > this email. Any views or opinions presented in this email are
> > > > > solely those of the author and might not represent those of
> > > > > Physician Select Management. Warning: Although Physician Select
> > > > > Management
> > > has
> > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > this email, the company cannot accept responsibility for any loss
> > > > > or damage arising
> > > > from the use of this email or attachments.
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Manage your subscription:
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > >
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Manage your subscription:
> > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > >
> > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > Disclaimer : This email and any files transmitted with it are
> > > > confidential and intended solely for intended recipients. If you are
> > > > not the named addressee you should not disseminate, distribute, copy
> > > > or alter this email. Any views or opinions presented in this email
> > > > are solely those of the author and might not represent those of
> > > > Physician Select Management. Warning: Although Physician Select
> > > > Management has taken reasonable precautions to ensure no viruses are
> > > > present in this email, the company cannot accept responsibility for
> > > > any loss or damage
> > > arising from the use of this email or attachments.
> > > > _______________________________________________
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > >
> > > > ClusterLabs home: https://www.clusterlabs.org/
> > > Disclaimer : This email and any files transmitted with it are
> > > confidential and intended solely for intended recipients. If you are
> > > not the named addressee you should not disseminate, distribute, copy
> > > or alter this email. Any views or opinions presented in this email are
> > > solely those of the author and might not represent those of Physician
> > > Select Management. Warning: Although Physician Select Management has
> > > taken reasonable precautions to ensure no viruses are present in this
> > > email, the company cannot accept responsibility for any loss or damage
> > arising from the use of this email or attachments.
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> > Disclaimer : This email and any files transmitted with it are confidential and
> > intended solely for intended recipients. If you are not the named addressee
> > you should not disseminate, distribute, copy or alter this email. Any views or
> > opinions presented in this email are solely those of the author and might not
> > represent those of Physician Select Management. Warning: Although
> > Physician Select Management has taken reasonable precautions to ensure
> > no viruses are present in this email, the company cannot accept responsibility
> > for any loss or damage arising from the use of this email or attachments.
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list