[ClusterLabs] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Mon Oct 16 13:58:51 EDT 2017

On Tue, Sep 26, 2017 at 07:17:15AM +0000, Eric Robinson wrote:
> > I don't know the tool, but isn't the expectation a bit high that the tool will trim
> > the correct blocks throuch drbd->LVM/mdadm->device? Why not use the tool
> > on the affected devices directly?
> > 
> 
> I did, and the corruption did not occur. It only happened when writing
> through the DRBD layer. Also, I disabled the TRIM function of the
> tools and merely used it as a drive burn-in without triggering any
> trim commands. Same results.

Just to put this out here too,
and not only in the corresponding thread on the DRBD list:

"the tool" is broken.

Below is very slightly edited for typo and context
from my original post on drbd-user yesterday.

2017-10-12 11:14:55 +0200, Robert Altnoeder @ linbit on drbd-user:
> ... this program does not appear to be very trustworthy, because ...
> of incorrect datatypes for the purpose - apparently, it is attempting to
> memory-map quite large files (~ 70 GiB) and check using a byte-indexed
> offset declared as type 'unsigned', which is commonly only 32 bits wide,
> and therefore inadequate for the byte-wise indexing of anything that is
> larger than 4 GiB.

The test program has other issues as well,
like off-by-one (and thus stack corruption) when initializing the
"buffer" in its "writeAtomically",
unlinking known non-existent files, and other things.
Probably harmless.

But this 32bit index vs >= 4 GiByte file content
is the real bug here, thank you Robert for pointing that out.

Why it does not trigger if DRBD is not in the stack I cannot tell,
maybe the timing is just strangely skewed, and somehow your disk fills
up and everything terminates before the "DetectCorruption" thread tries
to check a >= 4GiB file for the first time.

Or the 8 writer threads starve out the single mmap reader so violently,
that is simply checks so slow it did not get around to the >= 4GiB files.

Anyways: what happens is:

https://github.com/algolia/trimtester/blob/48c44d5beb88/trimtester.cpp#L213

    void _checkFile(const std::string &path, const char *file, std::string &filename) {
        filename.resize(0);
        filename.append(path);
        filename.push_back('/');
        filename.append(file);
        MMapedFile mmap(filename.c_str());
        if (mmap.loaded()) {
            bool corrupted = false;
            // Detect all 512-bytes page inside the file filled by 0 -> can be caused by a buggy Trim
            for (unsigned i = 0; !corrupted && i < mmap.len(); i += 512) {

// after some number of iterrations,
// i = 4294966784, 2 ** 32 - 512;
// mmap.len however is *larger*.
// in the "i << mmpa.len()", the 32bit integer i is "upscaled",
// size-extended, before the comparison, so that remains true.

                if (mmap.len() - i > 4) { // only check page > 4-bytes to avoid false positive

// again, size-extension to 64bit, condition is true

                    bool pagecorrupted = true; 

// *assume* that the "page" was corrupted,

                    for (unsigned j = i; j < mmap.len() && j < (i + 512); ++j) {                    

// j = i, which is j = 4294966784, (i << mmap.len) is again true because
// of the size-extension of i to 64bit in that term,
// but for the (j < i+ 512) term, neither j nor i is size-extended,
// i + 512 wraps to 0, j < 0 is false,
// loop will not execute even once,
// which means no single byte is checked

                        if (mmap.content()[j] != 0)
                            pagecorrupted = false;
                    }
                    if (pagecorrupted)
                        corrupted = true;

// any we "won" a "corrupted" flag by simply "assuming"
// no bytes are bad bytes.
// "So sad." ;-)

                }
            }
            if (corrupted) {
                std::cerr << "Corrupted file found: " << filename << std::endl;
                exit(1);
            }

        }
    }

Just change "unsigned" to "uint64_t" there, and be happy.

Don't believe it?
Create any file of 4 GiB or larger,
make sure it does not contain 512 (aligned) consecutive zeros,
and "check" it for "corruption" with that logic of trimtester.
It will report that file as corrupted each time.

rm -rf trimtester-is-broken/
mkdir trimtester-is-broken
o=trimtester-is-broken/x1
echo X > $o
l=$o
for i in `seq 2 32`; do
	o=trimtester-is-broken/x$i;
	cat $l $l > $o ;
	rm -f $l;
	l=$o;
done
./TrimTester trimtester-is-broken

Wahwahwa Corrupted file found: trimtester-is-broken/x32 mimimimi

Thanks,
that was a nice excercise in proofreading c++ code.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT