[ClusterLabs] Antw: Re: SAN with drbd and pacemaker
Ulrich.Windl at rz.uni-regensburg.de
Fri Sep 18 03:42:00 EDT 2015
>>> Marco Marino <marino.mrc at gmail.com> schrieb am 18.09.2015 um 09:28 in
<CAFHVVuKUd6zYLp0tcf=LQbMfd5XHbyBE-MBLtC45kBDK4fZzxA at mail.gmail.com>:
> ok, first if all, thank you for your answer. This is acomplicated task and
> I cannot found many guides (if you have are welcome).
> I'm using RAID6 and I have 20 disks of 4TB each.
> In RAID6 space efficiency is 1-2/n, so a solution for small Virtual Drive
> could be 4 or 5 disks. If I use 4 disks I will have (4*4) * (1-2/4) = 8 TB
> of effective space. Instead, if I use 5 disks, I will have (5*4) * (1-2/5)
> = 12TB of effective space.
> Space efficiency is not a primary goal for me, I'm trying to reduce time of
> rebuilding when a disk fails (and performance improvement!).
> "If you run 20x4TB disks as RAID6, then an 8TB volume is only ~500G per
> disk. However, if one disk fails, then all the other 15 volumes this
> disk handles are broken, too. (BTW, most raid controller can handle
> multiple stripes per disk, but usually only a handful) In such case the
> complete 4TB of the broken disk has to be recovered, affecting all 16
> Can you explain me this? 16 volumes?
I really don't know, but my guess is that the array controller rearranges the
data for the remaining good drives (assuming there is enough unallocated space
on each disk).
Most modern systems work like this: You define logical disks as RAID on
physical disks, and the controller "slices" the disks to build a RAID from
these slices. If you have enough disks and unallocated room on each disk, the
controller can readjust the slices on each disk to rebuild a fully redundant
RAID with the remaining disks. That way a "hot spare" is no longer a physical
disk, but the unallocated capacity corresponding to an empty disk.
Despite of all that, my tests (with MD-RAID) showed that RAID6 has a
significantly poorer write throughput than RAID5, but read hroughput may be
If you have many disks, but don't need the fully capacity, you can configure
"interesting things" like a RAID1 on top of two RAID0 with 10 disks each (or a
RAID5 on top of 10 RAID1 with two disks each)...
> Thank you
> 2015-09-17 15:54 GMT+02:00 Kai Dupke <kdupke at suse.com>:
>> On 09/17/2015 09:44 AM, Marco Marino wrote:
>> > Hi, I have 2 servers supermicro lsi 2108 with many disks (80TB) and I'm
>> > trying to build a SAN with drbd and pacemaker. I'm studying, but I have
>> > experience on large array of disks with drbd and pacemaker, so I have
>> > questions:
>> > I'm using MegaRAID Storage Manager to create virtual drives. Each
>> > drive is a device on linux (eg /dev/sdb, /dev/sdc.....), so my first
>> > question is: it's a good idea to create virtual drive of 8 TB (max)? I'm
>> > thinking to rebuild array time in case of disk failure (about 1 day for
>> It depends on your disks and RAID level. If one disk fails the content
>> of this disk has to be recreated by either copying (all RAID levels with
>> some RAID 1 included) or calculating (all with no RAID1 included), in
>> the later case all disks get really stressed.
>> If you run 20x4TB disks as RAID6, then an 8TB volume is only ~500G per
>> disk. However, if one disk fails, then all the other 15 volumes this
>> disk handles are broken, too. (BTW, most raid controller can handle
>> multiple stripes per disk, but usually only a handful) In such case the
>> complete 4TB of the broken disk has to be recovered, affecting all 16
>> On the other side, if you use 4x5x4TB as 4x 12TB RAID6, a broken disk
>> only affects one of 4 volumes - but at the cost of more disks needed.
>> You can do the similar calculation based on RAID16/15.
>> The only reason I see to create small slices is to make them fit on
>> smaller replacement disks, which might be more easily available/payable
>> at time of error (but now we are entering a more low cost area where
>> usually SAN and DRBD do not take place).
>> Kai Dupke
>> Senior Product Manager
>> Server Product Line
>> Sell not virtue to purchase wealth, nor liberty to purchase power.
>> Phone: +49-(0)5102-9310828 Mail: kdupke at suse.com
>> Mobile: +49-(0)173-5876766 WWW: www.suse.com
>> SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
>> GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
>> Users mailing list: Users at clusterlabs.org
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
More information about the Users