[ClusterLabs] volume group won't start in a nested DRBD setup

Jean-Francois Malouin Jean-Francois.Malouin at bic.mni.mcgill.ca
Mon Oct 28 15:44:50 EDT 2019


Hi,

Is there any new magic that I'm unaware of that needs to be added to a
pacemaker cluster using a DRBD nested setup? pacemaker 2.0.x and DRBD 8.4.10 on
Debian/Buster on a 2-node cluster with stonith.
Eventually this will host a bunch of Xen VMs.

I had this sort of thing running for years with pacemaker 1.x  DRBD 8.4.x
without an itch and now with pacemaker 2.0 and drbd 8.4.10 it gives me errors
on trying to start the volume group vg0 on this chain:

 (VG)           (LV)         (PV)       (VG) 
vmspace ----> xen_lv0 ----> drbd0 ----> vg0 

Only drbd0 and after are managed by pacemaker.

Here's what I have configured so far (stonith is configured but is not shown below):

---
primitive p_lvm_vg0 ocf:heartbeat:LVM \
    params volgrpname=vg0 \
    op monitor timeout=30s interval=10s \
    op_params interval=10s

primitive resDRBDr0 ocf:linbit:drbd \
    params drbd_resource=r0 \
    op start interval=0 timeout=240s \
    op stop interval=0 timeout=100s \
    op monitor interval=29s role=Master timeout=240s \
    op monitor interval=31s role=Slave timeout=240s \
    meta migration-threshold=3 failure-timeout=120s

ms ms_drbd_r0 resDRBDr0 \
    meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

colocation c_lvm_vg0_on_drbd_r0 inf: p_lvm_vg0 ms_drbd_r0:Master

order o_drbd_r0_before_lvm_vg0 Mandatory: ms_drbd_r0:promote p_lvm_vg0:start
---

/etc/lvm/lvm.conf has global_filter set to:
global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ]

But I'm note sure if its sufficient. I seem to be missing some crucial ingredient.

syslog on the DC shows the following when trying to start vg0:

Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  Reading all physical volumes. This may take a while... Found volume group "vmspace" using metadata type lvm2 Found volume group "freespace" using metadata type
 lvm2 Found volume group "vg0" using metadata type lvm2 
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  0 logical volume(s) in volume group "vg0" now active 
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not available (stopped)
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate correctly
Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: p_lvm_vg0_start_0:8775:stderr [   Configuration node global/use_lvmetad not found ]
Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate correctly ]
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Result of start operation for p_lvm_vg0 on node2: 7 (not running) 
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: node2-p_lvm_vg0_start_0:77 [   Configuration node global/use_lvmetad not found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ]
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  warning: Action 42 (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed 
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 (Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-39.bz2): Complete
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: On loss of quorum: Ignore
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed start of p_lvm_vg0 on node1: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 away from node1 after 1000000 failures (max=1000000)
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice:  * Recover    p_lvm_vg0           (                 node2 )  
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: Calculated transition 603, saving inputs in /var/lib/pacemaker/pengine/pe-input-40.bz2
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice: On loss of quorum: Ignore
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed start of p_lvm_vg0 on node1: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 away from node2 after 1000000 failures (max=1000000)
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 away from node1 after 1000000 failures (max=1000000)
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice:  * Stop       p_lvm_vg0           (                 node2 )   due to node availability
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice: Calculated transition 604, saving inputs in /var/lib/pacemaker/pengine/pe-input-41.bz2
Oct 28 14:42:57 node2 pacemaker-controld[27057]:  notice: Initiating stop operation p_lvm_vg0_stop_0 locally on node2 
Oct 28 14:42:57 node2 LVM(p_lvm_vg0)[8881]: INFO: Deactivating volume group vg0
Oct 28 14:42:57 node2 LVM(p_lvm_vg0)[8881]: INFO:  0 logical volume(s) in volume group "vg0" now active 
Oct 28 14:42:57 node2 LVM(p_lvm_vg0)[8881]: INFO: LVM Volume vg0 is not available (stopped)

Any help gratefully accepted!
jf


More information about the Users mailing list