[ClusterLabs] Postgresql HA with Corosync/Pacemaker

Fri Mar 13 02:26:56 UTC 2015

> On 12 Mar 2015, at 5:54 pm, void ship <earthling6 at gmail.com> wrote:
> 
> Andrew et al,
> 
> I'm impressed by the amount of work gone into this project. Normally nothing but praises. Today I'm at my wits end. After several weeks of unsuccessfully jostling with the unnavigable Mischung of software versions, cluster shells, and OS-vendor-specific issues, I'm turning to you for help. 
> 
> Goal: HA Postgresql cluster not unlike described at http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster, however with quorum (a third, voting-only entity that arbitrates split-brain issues).
> 
> OS/Vendor: Linux 2.6.32, RHEL 6.5  (The voiting-only quorum-maker is actually CentOS 6.6. It was not my choice.)
> Postgresql: 9.1.8  (currently stuck with that for compatibility issues and experience base, but I could upgrade)
> Transport Protocol: udpu (required for network reasons)
> 
> -- Rounds 1 through 5  -- 
>   Corosync-1.4.7
>   Pacemaker 1.1.12  (libs, cli)
>   clusterlib-3.0.12
>   cluster-glue-libs-1.0.5
>   resource-agents-3.9.5
> 
> 
>   Note: Between my 1st & 2nd tries on this one, 5 weeks had passed due to an illness.
> 
>   First, the documentation (see link) is in error, as it suggests a corosync configuration with service pacemaker version 0 and then describes launching pacemaker after corosync. Specifying version 0 here will cause corosync to attempt to start pacemaker as a plugin. The directives conflict somehow and nothing really works -- corosync shows slave zombied processes.  Also had another error causing unexpected results in the TOTEM protocol -- I think it was fixed by removing mcastaddr which apparently didn't mix well with the udpu transport protocol.
> 
>   Eventually, corocsync ran correctly and commands such as "crm_mon -Afr -1" showed expected results.
> 
>   Also got postgresql to work as master/slave (this is where my experience base is, so no problem here).
> 
>   Used the pcs configuration fairly close to described in the documentation. I'm not using 3 different subnets as that's really unnecessary and quite impossible for me. There are two "physical" IPs and 2 virtual/service IPs. They are all within the same 22/CIDR LAN, but I don't see how that's a problem. 
> 
>   After running the pcs script, pacemaker starts both the PRI and HS in recovery mode. I cannot see a reason for this. 
>   I start over with the configuration (clearing it, restarting pacemaker everywhere) and this time, leave pacemaker not running on the secondary. The primary is not started. There is no given reason for it, but the crm_mon output indicates something odd:
>     + master-pgsql                      : -INFINITY
>     + pgsql-data-status                 : LATEST    
> Logs indicate no attempt to starting the database was attempted. But again, if I have the HS in the cluster, *it* gets started but in recovery mode.
> 
>   But perhaps I should be using corosync 2.3.3, which is available for the above platforms. 
>   
> 
> -- Round 6 -- 
>   Install corosync-2.3.3. Won't upgrade due to conflicts; must uninstall pacemaker first. Afterwards, clusterlib refuses to install due to dependency conflicts. Another user posted this problem on this forum 2 years ago, so I'm surpised this is still an issue. By the way, the repo URL for this is: http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/   Among the unresolvable errors include:
>            Requires: libcoroipcc.so.4()(64bit)
> Error: Package: pacemaker-libs-1.1.12+git20140723.483f48a-1.1.x86_64 (network_ha-clustering_Stable)
>            Requires: libconfdb.so.4()(64bit)
> Error: Package: pacemaker-cli-1.1.12+git20140723.483f48a-1.1.x86_64 (network_ha-clustering_Stable)
> 
> 
> 
> What's the best route for me to go here? Find the right set of RPMs? Build from source? If so which versions? Throw it all away and try to get CMAN (shiver)? Go back to Corosync 1.4.7 and try again?

On RHEL6, definitely go for pacemaker+CMAN.
Under no circumstances use the plugin and corosync.conf

Changing your stack is unlikely to help, but grabbing the latest pgsql agent might not be a bad idea.

   wget https://raw.githubusercontent.com/ClusterLabs/resource-agents/HEAD/heartbeat/pgsql

I would also recommend: crm_resource --force-start -V -r pgsql
This will grab the config from the cluster and run it locally, with debugging enabled, so that you can see _everything_ it is doing on startup.
Maybe this will provide some additional insights.

> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org