[Pacemaker] Properly fencing Postgres

Fri Mar 5 09:33:05 EST 2010

> I don't know if the pgsql RA can support "cold standby"
> instances.
>

In my opinion "cold standby" is a server has has access to the data
files where PostreSQL is down but can be brought up any time. pgsql RA
does exactly that if other resources proved access to the data. What
pgsql RA doesn't do is data synchronization in a master/slave way. But
as far as I understand this is not required here.

>> Also note:
>>
>>  - We intend to use the IPaddr2 resource agent to cluster an IP
>> address across master and slave.
>>
>>  - I'm not sure we need to use Pacemaker to manage HAProxy on slave;
>> it will simply not be used until the IP address fails over to slave.
>
> The difference is that if it fails, the cluster won't be able to
> help you. Otherwise, you can configure it as a cloned resource.
>
>>  - To deal with sometimes severe peaks in our load, we'll have
>> HAProxy on master send certain requests to the "live" Zope app
>> server processes on slave. HAProxy deals with Zope processes going
>> up and down, so we don't really need to cluster these per se.
>>
>>  - Zope communicates with Postgres. We intend that connection string
>> to use the floating IP address, so that if Postgres fails over to
>> slave, Zope will be unaware.
>>
>>  - Memcached is used by Zope to cache certain Postgres database
>> queries, so it would be similar. We can have this on hot standby (if
>> that's easier?) since it only manages data in local RAM, but the
>> memcached connection string would use the floating IP address too.
>>
>>  - Zope writes certain "blob" files to the filesystem. All Zope
>> clients (across both servers) need a shared blob directory. They do
>> implement locking on this directory, so concurrent access is not a
>> problem.
>>
>> Now for the bits I'm less sure about:
>>
>>  - We were thinking to create a DRBD partition with OCFS2 for
>> Postgres data + blob data. IIUC, that setup handles multiple nodes
>> writing, so the blob storage should work fine (since Zope will
>> ensure integrity of the directory
>
> OK.
>
>>  - We were thing to use the pgsql resource agent that comes with
>> Pacemaker to manage Postgres.
>>
>>  - The postgres data would need fencing when failing over, from what
>> I understand. I read the notes that using an on-board device like
>> Dell's DRAC to implemenet STONITH is not a good idea. We don't have
>> the option at this stage to buy a UPS-based solution (we do have
>> UPS, but it can't be used to cut power to individual servers). We do
>> have two pairs of NICs in each server, one of which would be used
>> "crossover" between master and slave.
>
> The problem with lights-out devices such as DRAC is that if they
> lose power then fencing doesn't work. But if you have them
> connected to UPS which is reliable then DRAC should be OK.
>
>> Given this, what is the best way to implement fencing in this
>> situation? Could we use DRBD to just refuse master write access to
>> the slave disk? Could we accept a bit more risk and say that STONITH
>> will succeed even if *communication* with the DRAC fails, but will
>> try to use DRAC if it can reach it?
>
> This is not possible. If the fencing action fails, the cluster
> won't make any progress.
>
>> This may solve the "fencing
>> indefinitely" problem when postgres is failing over due to a power
>> outage on master, and Pacemaker can't find DRAC to kill master.
>
> On two-node clusters fencing replaces quorum so it is
> indispensable.
>
>>  - If HAProxy or memcached on master fails (in the software sense),
>> we'd need to fail over the floating IP address so that the front-end
>> firewall and the Zope connection strings would continue to work,
>> even though we have hot standby's on slave. Is this the right thing
>> to do?
>
> Looks like it.
>
>> If so, I'd appreciate some pointers on how to configure this.
>> There are no resource agents that ship with Pacemaker I can find for
>> memcached/HAProxy, though perhaps it'd be better to create them and
>> let Pacemaker manage everything?
>
> It is also possible to use init scripts (lsb). I guess that those
> exist, just test them thoroughly. If you let the cluster manage
> them, they can be monitored.
>
>> In that case, how do we manage the
>> connection string issue (making sure Zope talks to the right thing)
>> if not by failing over an IP address?
>
> You lost me here. The IP address is going to failover. I don't
> see where is the "connection string issue"?
>
> Thanks,
>
> Dejan
>
>> Thanks a lot!
>>
>> Martin
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

-- 
Serge Dubrouski.