[Pacemaker] Trouble with DRBD mount

Fri Mar 1 16:40:53 UTC 2013

Hi Andreas:

Thanks so much for the response! - is it OK to write you back?  New to the forum so am unsure of protocol...apologize if we should have posted this directly.

Per your suggestions, we made the following adjustments but are still stuck:

- set the LVM.conf filter back to what it had been (filter = [ "a/.*/" ]) and cleared the LVM cache
- adjusted our pacemaker config to the following (we tried adding some "start-delay" statements, but get the same results with or without them):

a) crm configure show
node server1
node server2
primitive app_ip ocf:heartbeat:IPaddr \
    params ip="192.168.1.152" \
    op monitor interval="30s"
primitive drbd ocf:linbit:drbd \
    params drbd_resource="r1" \
    op start interval="0" timeout="240" \
    op stop interval="0" timeout="100" \
    op monitor interval="59s" role="Master" timeout="30s" start-delay="1m" \
    op monitor interval="60s" role="Slave" timeout="30s" start-delay="1m"
primitive fs_vservers ocf:heartbeat:Filesystem \
    params device="/dev/vg2/vserverLV" directory="/vservers" fstype="ext4" \
    op start interval="0" timeout="60" \
    op stop interval="0" timeout="120" \
    meta target-role="Started"
ms ms_drbd drbd \
    meta master-node-max="1" clone-max="2" clone-node-max="1" globally-unique="false" notify="true" target-role="Started"
location cli-prefer-app_ip app_ip \
    rule $id="cli-prefer-rule-app_ip" inf: #uname eq server2
location drbd_on_node1 ms_drbd \
    rule $id="drbd_on_node1-rule" $role="master" 100: #uname eq server1
property $id="cib-bootstrap-options" \
    dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    stonith-enabled="false" \
    no-quorum-policy="ignore"

b) Our drbd.conf is:

global { usage-count no; }
common { syncer { rate 100M; } }
#original
resource r1 {
        protocol C;
    startup {
                wfc-timeout  15;
                degr-wfc-timeout 60;
        }
        device /dev/drbd1 minor 1;
        disk /dev/vg2/vserverLV;
        meta-disk internal;

# following 2 definition are equivalent
        on server1 {
                address 192.168.1.129:7801;
                disk /dev/vg2/vserverLV;
        }
        on server2 {
                address 192.168.1.128:7801;
                 disk /dev/vg2/vserverLV;
        }

#       floating 192.168.5.41:7801;
#       floating 192.168.5.42:7801;
         net {
        cram-hmac-alg sha1;
                shared-secret "secret";
                  after-sb-0pri discard-younger-primary; #discard-zero-changes;
                  after-sb-1pri discard-secondary;
                  after-sb-2pri call-pri-lost-after-sb;
        }
}
c)  A few times, the "fs_vservers" seems to have started, but generally after a reboot we get:

Last updated: Fri Mar  1 11:07:48 2013
Stack: openais
Current DC: server1 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ server2 server1 ]

 app_ip    (ocf::heartbeat:IPaddr):    Started server2
 Master/Slave Set: ms_drbd
     Masters: [ server1 ]
     Slaves: [ server2 ]

Failed actions:
    fs_vservers_start_0 (node=server2, call=5, rc=1, status=complete): unknown error
    fs_vservers_start_0 (node=server1, call=8, rc=1, status=complete): unknown error

Our understanding from your last note is we want our LVM to start on it's own, and start before DRBD.  We don't see it doing so in the boot posts we can see (ctrl-alt-f1), but do see it after boot if we do "lvdisplay".  We can "mount /dev/drbd1 /vservers" by hand, but assume that is a symptom not a solution.  Likewise, we have to comment out the LVM statement in fstab (#/dev/drbd1 /vservers ext4    defaults        0       2) as we can't boot with it un-commented without getting a "Control D" situation.

Any thoughts would be great!  Thanks,

Ted

-----Original Message-----
From: Andreas Kurz <andreas at hastexo.com>
To: pacemaker <pacemaker at oss.clusterlabs.org>
Sent: Thu, Feb 28, 2013 6:15 pm
Subject: Re: [Pacemaker] Trouble with DRBD mount

On 2013-02-28 13:19, senrabdet at aol.com wrote:
> Hi All:
> 
> We are stuck trying to get pacemaker to work with DRBD, and having tried
> various alternatives can't get our "drbd1" to mount and get some errors.
> 
> NOTE:  we are trying to get pacemaker to work with an existing Encrypted
> RAID1 LVM setup - is this impossible or a "just plain bad idea"?   We
> were thinking we'd like the potential advantages of local RAID on each
> box as well as the Internet RAID & failover provided by DRBD/pacemaker.
>  We're using Debian Squeeze.  Per various instructions, we've disabled
> the DRBD boot init (update-rc.d -f drbd remove) and set the LVM filter
> to filter = [ "a|drbd.*|", "r|.*|" ].

so you only allow scanning for LVM signatures on DRBD ... that needs to
be in Primary mode before ....

> 
> FYI - we've commented out the LVM mount "/dev/vg2/vserverLV" in our
> fstab, and consistently seem to need to do this to avoid a boot error.
> 
> We think DRBD works until we add in the pacemaker steps (i.e.,
> "dev/drbd1" mounts at boot; we can move related data from server1 to
> server2 back and forth, though need to use the command line to
> accomplish this).  We've seen various statements on the net that suggest
> it is viable to use a "mapper" disk choice in drbd.conf.  Also, if we
> start by configuring Pacemaker for a simple IP failover, that works
> (i.e., no errors, we can ping via the fail over address) but stops
> working when we add in the DRBD primatives and related statements.  Our
> suspicion (other than maybe "you can't do this with existing RAID") is
> that we're using the wrong "disk" statement in our drbd.conf or maybe in
> our "primitive fs_vservers" statement, though we've tried lots of
> alternatives and this is the same drbd.conf we use before adding in
> Pacemaker and it seems to work at that point.
> 
> Lastly, while various config statements refer to "vservers", we have not
> gotten to the point of trying to add any data to the DRBD devices other
> than a few text files that have disappeared since doing our "crm" work.
> 
> Any help appreciated!  Thanks, Ted
> 
> CONFIGS/LOGS
> 
> A) drbd.conf
> 
> global { usage-count no; }
> common { syncer { rate 100M; } }
> #original
> resource r1 {
>         protocol C;
> startup {
>                 wfc-timeout  15;
>                 degr-wfc-timeout 60;
>         }
>         device /dev/drbd1 minor 1;
>           disk /dev/vg2/vserverLV;

so vg2/vserverLV is the lower-level device for DRBD, simply let vg2 be
automatically activated and forget that LVM filter thing you did, that
is only needed for vgs sitting _on_ DRBD, not below.

>         meta-disk internal;
> 
> # following 2 definition are equivalent
>         on server1 {
>                 address 192.168.1.129:7801;
>                  disk /dev/vg2/vserverLV;
>         }
>         on server2 {
>                 address 192.168.1.128:7801;
>                  disk /dev/vg2/vserverLV;
> #disk /dev/mapper/md2_crypt;
>         }
> 
> #       floating 192.168.5.41:7801;
> #       floating 192.168.5.42:7801;
>          net {
> cram-hmac-alg sha1;
>                 shared-secret "secret";
>                   after-sb-0pri discard-younger-primary;
> #discard-zero-changes;
>                   after-sb-1pri discard-secondary;
>                   after-sb-2pri call-pri-lost-after-sb;
>         }
> }
> 
> 
> B) Pacemaker Config
> 
> crm configure show
> node server1
> node server2
> primitive app_ip ocf:heartbeat:IPaddr \
> params ip="192.168.1.152" \
> op monitor interval="30s"
> primitive drbd ocf:linbit:drbd \
> params drbd_resource="r1" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100" \
> op monitor interval="59s" role="Master" timeout="30s" \
> op monitor interval="60s" role="Slave" timeout="30s"
> primitive fs_vservers ocf:heartbeat:Filesystem \
> params device="/dev/drbd1" directory="/vservers" fstype="ext4" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="120"
> primitive vg2 ocf:heartbeat:LVM \
> params volgrpname="vg2" exclusive="true" \

simply remove all that LVM things from your pacemaker configuration

> op start interval="0" timeout="30" \
> op stop interval="0" timeout="30"
> group lvm app_ip vg2 fs_vservers

ouch .. a group called "lvm", am I the only one who thinks this is
confusing?

> ms ms_drbd drbd \
> meta master-node-max="1" clone-max="2" clone-node-max="1"
> globally-unique="false" notify="true" target-role="Master"
> location drbd_on_node1 ms_drbd \
> rule $id="drbd_on_node1-rule" $role="master" 100: #uname eq server1
> colocation vserver-deps inf: ms_drbd:Master lvm

wrong direction  .. you want the group follow the DRBD master

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> order app_on_drbd inf: ms_drbd:promote lvm:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
> 
> 
> 
> C)  crm status results (with errors)
> Last updated: Wed Feb 27 19:05:57 2013
> Stack: openais
> Current DC: server1 - partition with quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ server2 server1 ]
> 
> 
> Migration summary:
> * Node server2: 
>    drbd:1: migration-threshold=1000000 fail-count=1000000
> * Node server1: 
>    drbd:0: migration-threshold=1000000 fail-count=1000000
> 
> Failed actions:
>     drbd:1_start_0 (node=server2, call=8, rc=-2, status=Timed Out):
> unknown exec error
>     drbd:0_start_0 (node=server1, call=6, rc=-2, status=Timed Out):
> unknown exec error
> 
> D)  Mount
> 
> /dev/mapper/vg1-root on / type ext4 (rw,errors=remount-ro)
> tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
> proc on /proc type proc (rw,noexec,nosuid,nodev)
> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
> udev on /dev type tmpfs (rw,mode=0755)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
> /dev/md0 on /boot type ext4 (rw)
> /dev/mapper/vg1-home on /home type ext4 (rw)
> /dev/mapper/vg1-tmp on /tmp type ext4 (rw)
> /dev/mapper/vg1-usr on /usr type ext4 (rw)
> /dev/mapper/vg1-var on /var type ext4 (rw)
> fusectl on /sys/fs/fuse/connections type fusectl (rw)
> 
> 
> E)  fstab
> 
> # /etc/fstab: static file system information.
> #
> # Use 'blkid' to print the universally unique identifier for a
> # device; this may be used with UUID= as a more robust way to name devices
> # that works even if disks are added and removed. See fstab(5).
> #
> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
> proc            /proc           proc    defaults        0       0
> /dev/mapper/vg1-root /               ext4    errors=remount-ro 0       1
> # /boot was on /dev/md0 during installation
> UUID=25829c6c-164c-4a1e-9e84-6bab180e38f4 /boot           ext4  
>  defaults        0       2
> /dev/mapper/vg1-home /home           ext4    defaults        0       2
> /dev/mapper/vg1-tmp /tmp            ext4    defaults        0       2
> /dev/mapper/vg1-usr /usr            ext4    defaults        0       2
> /dev/mapper/vg1-var /var            ext4    defaults        0       2
> #/dev/mapper/vg2-vserverLV /vservers       ext4    defaults        0       2
> /dev/mapper/vg1-swap none            swap    sw              0       0
> /dev/scd0       /media/cdrom0   udf,iso9660 user,noauto     0       0
> /dev/scd1       /media/cdrom1   udf,iso9660 user,noauto     0       0
> /dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0
> 
> 
> F)  fdisk -l
> 
> Disk /dev/sda: 160.0 GB, 160041885696 bytes
> 255 heads, 63 sectors/track, 19457 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0007c7a2
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *           1          61      487424   fd  Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sda2              61        1885    14648320   fd  Linux raid
> autodetect
> /dev/sda3            1885        3101     9765888   fd  Linux raid
> autodetect
> 
> Disk /dev/sdb: 203.9 GB, 203928109056 bytes
> 255 heads, 63 sectors/track, 24792 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0008843c
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1   *           1          61      487424   fd  Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sdb2              61        1885    14648320   fd  Linux raid
> autodetect
> /dev/sdb3            1885        3101     9765888   fd  Linux raid
> autodetect
> 
> Disk /dev/md0: 499 MB, 499109888 bytes
> 2 heads, 4 sectors/track, 121853 cylinders
> Units = cylinders of 8 * 512 = 4096 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/md1: 15.0 GB, 14998757376 bytes
> 2 heads, 4 sectors/track, 3661806 cylinders
> Units = cylinders of 8 * 512 = 4096 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x08040000
> 
> 
> Disk /dev/md2: 9999 MB, 9999147008 bytes
> 2 heads, 4 sectors/track, 2441198 cylinders
> Units = cylinders of 8 * 512 = 4096 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x08040000
> 
> 
> Disk /dev/dm-0: 15.0 GB, 14997704704 bytes
> 255 heads, 63 sectors/track, 1823 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/dm-1: 3997 MB, 3997171712 bytes
> 255 heads, 63 sectors/track, 485 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/dm-2: 1996 MB, 1996488704 bytes
> 255 heads, 63 sectors/track, 242 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/dm-3: 1996 MB, 1996488704 bytes
> 255 heads, 63 sectors/track, 242 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/dm-4: 3997 MB, 3997171712 bytes
> 255 heads, 63 sectors/track, 485 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/dm-5: 1996 MB, 1996488704 bytes
> 255 heads, 63 sectors/track, 242 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/dm-6: 499 MB, 499122176 bytes
> 255 heads, 63 sectors/track, 60 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> 
> Disk /dev/dm-7: 9998 MB, 9998094336 bytes
> 255 heads, 63 sectors/track, 1215 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
> 
> G)  syslog excerpt
> 
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) Command '
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) drbdsetup
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr)  
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) 1
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr)  
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) disk
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr)  
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) /dev/vg2/vserverLV
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr)  
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) /dev/vg2/vserverLV
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr)  
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) internal
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr)  
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) --set-defaults
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr)  
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) --create-device
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) ' terminated with exit code 20
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) drbdadm attach r1: exited with code 20
> Feb 27 06:36:22 server1 drbd[2329]: ERROR: r1: Called drbdadm -c
> /etc/drbd.conf --peer server2 up r1
> Feb 27 06:36:22 server1 drbd[2329]: ERROR: r1: Exit code 1
> Feb 27 06:36:22 server1 drbd[2329]: ERROR: r1: Command output: 
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stdout) 
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) Can not open device '/dev/vg2/vserverLV': No
> such file or directory
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) Command 'drbdsetup 1 disk /dev/vg2/vserverLV 
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) /dev/vg2/vserverLV internal --set-defaults
> --create-device' terminated with exit code 20#012drbdadm attach r1:
> exited with code 20
> Feb 27 06:36:22 server1 drbd[2329]: ERROR: r1: Called drbdadm -c
> /etc/drbd.conf --peer server2 up r1
> Feb 27 06:36:22 server1 drbd[2329]: ERROR: r1: Exit code 1
> Feb 27 06:36:22 server1 drbd[2329]: ERROR: r1: Command output: 
> Feb 27 06:36:22 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stdout) 
> Feb 27 06:36:23 server1 lrmd: [1705]: info: RA output:
> (p_drbd_r1:0:start:stderr) Can not open device '/dev/vg2/vserverLV': No
> such file or directory
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130301/1313a883/attachment.htm>