[Pacemaker] DRBD and fencing

Matthew Palmer mpalmer at hezmatt.org
Wed Mar 10 14:54:22 EST 2010


[Up-front disclaimer: I'm not a fan of cluster filesystems, having had large
chunks of my little remaining sanity shredded by GFS.  So what I say is
likely tinged with lingering loathing, although I do *try* to stay factual]

On Wed, Mar 10, 2010 at 09:01:01PM +0800, Martin Aspeli wrote:
> Matthew Palmer wrote:
>> On Wed, Mar 10, 2010 at 02:32:05PM +0800, Martin Aspeli wrote:
>>> Florian Haas wrote:
>>>> On 03/09/2010 06:07 AM, Martin Aspeli wrote:
>>>>> Hi folks,
>>>>>
>>>>> Let's say have a two-node cluster with DRBD and OCFS2, with a database
>>>>> server that's supposed to be active on one node at a time, using the
>>>>> OCFS2 partition for its data store.
>>>> *cringe* Which database is this?
>>> Postgres.
>>>
>>> Why are you cringing? From my reading, I had gathered this was a pretty
>>> common setup to support failover of Postgres without the luxury of a
>>> SAN. Are you saying it's a bad idea?
>>
>> PgSQL on top of DRBD is OK.  PgSQL on top of OCFS2 is a disaster waiting to
>> gnaw your leg off.
>
> Hah. I'm glad someone told me. ;-)
>
> Why is this?

Well, for a start you've got the problem that you could end up accidentally
running two copies of PostgreSQL on two separate machines against the same
chunk of data.  I don't know for sure, but my suspicion is that it's not
built to detect that particular case, and you'd quite possibly end up with
nasty database corruption.

Then there's the specific problems with IO on cluster filesystems.  Whilst
they do a reasonable job of doing what they do (shared access to filesystem
data), the nature of what they do is such that you can never expect the same
performance from them as a local filesystem.  Every IO operation effectively
has to be OK'd by the other machine(s) in the cluster, which is guaranteed
to slow things down.  This isn't a problem for regular file accesses --
they're rarely all that time critical -- but the sheer volume of IO
operations issued from a database (even a read-mostly DB) is going to be a
bit of a sticking point.

>>> Also note that this database will see relatively few write transactions
>>> compared to read transactions, if that makes a difference.
>>
>> Cluster filesystems suck at high IO request rates, regardless of whether
>> they're reads or writes.
>
> Gotcha - so it's mainly a performance issue?

For a rarely-used database, the performance isn't going to bite you --
although it's a ready-made and hard-to-work-around scaling bottleneck (of
which I am violently allergic), and should be avoided if you want to be
confident of any sort of reasonable performance into the future.  The thing
that would more keep me up at night would be the risk of two PgSQL instances
coming up on separate machines and lunching my data.  Data corruption gives
me the willies even more than my data centre burning down.

- Matt




More information about the Pacemaker mailing list