[Pacemaker] Pacemaker CoroSync + PGPool-II

Thu Apr 26 21:47:00 EDT 2012

On Wed, Apr 25, 2012 at 2:39 AM, Steven Bambling <smbambling at arin.net> wrote:
> I made two more tweaks on lines 240 and 255 changing is_ocf_ture to
> ocf_is_ture.
>
> Then I exported the paris and values using this as a guide
> http://www.clusterlabs.org/wiki/Debugging_Resource_Failures
>
> export OCF_RESKEY_pcp_admin_username="root"
> export OCF_RESKEY_pcp_admin_password=0rionChive

Your resource definition uses:

params pcp_admin_username=postgres \
params pcp_admin_password=password \

which seems a likely reason for the problem

> export OCF_RESKEY_pcp_admin_port=9898
> export OCF_RESKEY_pcp_admin_host=localhost
> export OCF_RESKEY_pgpool_bin=/usr/bin/pgpool
> export OCF_RESKEY_pcp_attach_node_bin=/usr/bin/pcp_attach_node
> export OCF_RESKEY_pcp_detach_node_bin=/usr/bin/pcp_detach_node
> export OCF_RESKEY_pcp_node_count_bin=/usr/bin/pcp_node_count
> export OCF_RESKEY_pcp_node_info_bin=/usr/bin/pcp_node_info
> export OCF_RESKEY_stop_mode=f
> export OCF_RESKEY_auto_reconnect=true
> export OCF_RESKEY_fail_on_detached=true
>
> Then I manually ran the RA script with all of its options and verified the
> error code reported was correct.  I don't see any issues with the resource
> agent script…but I still see the errors in crm_mon and in the logs (Apr 24
> 08:23:48 pg1.stage.net lrmd: [28471]: WARN: Managed pgPool:monitor process
> 28484 exited with return code 2)
>
> Results of RA script …...
>
>
> Start PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 start ; echo
> $?
> pgpool2[29968]: INFO: default Successfully started pgpool-II
> pgpool2[29968]: DEBUG: default start returned 0
> 0
>
> Status of running PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 status ;
> echo $?
> pgpool2[30036]: DEBUG: default status returned 0
> 0
>
> Stop PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 stop ; echo
> $?
> pgpool2[30065]: INFO: Using /usr/bin/pgpool -m f stop to stop pgpool-II
> stop request sent to pgpool. waiting for termination....done.
> pgpool2[30065]: INFO: default Successfully stopped pgpool-II
> pgpool2[30065]: DEBUG: default stop returned 0
> 0
>
> Status of stopped PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 status ;
> echo $?
> pgpool2[30076]: DEBUG: default status returned 7
> 7
>
> Monitor of stopped PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 monitor ;
> echo $?
> pgpool2[30116]: DEBUG: default monitor returned 7
> 7
>
> Start PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 start ; echo
> $?
> pgpool2[30123]: INFO: default Successfully started pgpool-II
> pgpool2[30123]: DEBUG: default start returned 0
> 0
>
> Monitor of running PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 monitor ;
> echo $?
> pgpool2[30190]: DEBUG: default monitor returned 0
> 0
>
> Validation of running PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 validate-all
> ; echo $?
> pgpool2[30220]: DEBUG: default validate-all returned 0
> 0
>
> Stop PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 stop ; echo
> $?
> pgpool2[30228]: INFO: Using /usr/bin/pgpool -m f stop to stop pgpool-II
> stop request sent to pgpool. waiting for termination....done.
> pgpool2[30228]: INFO: default Successfully stopped pgpool-II
> pgpool2[30228]: DEBUG: default stop returned 0
> 0
>
> Validation of stopped PGPool-II
> [root at pg1 heartbeat]# /usr/lib/ocf/resource.d/heartbeat/pgpool2 validate-all
> ; echo $?
> pgpool2[30238]: DEBUG: default validate-all returned 0
> 0
>
> On Apr 24, 2012, at 10:59 AM, Steven Bambling wrote:
>
> After doing some searching on setting up "PGPool-HA" to limit pgpool being a
> single point of failure it looks like development on the heartbeat project
> has reduced greatly and development has shifted to corosync  (backed by
> RedHat and Suse) that is recommend by pacemaker.
>
> I've found an article
> here http://masteinhauser.github.com/blog/2011/09/24/pacemaker-pgpool2/ that
> explains using pacemaker with pgpool-II.  In the post a resource agent is
> provided.  There was a quick tweak I had to make with the PGPool-II path
> created by the installed RPMs obtained
> from http://yum.postgresql.org/9.1/redhat/rhel-$releasever-$basearch.  I
> modified the below marked in bold
> from /var/run/pgpool/ to/var/run/pgpool-II-91
>
> pgpool2_status() {
>     if [ ! -r "/var/run/pgpool-II-91/pgpool.pid" ]; then
>         return $OCF_NOT_RUNNING
>     fi
>     ps_info=$(ps ax | grep "pgpool" | grep
> $(cat /var/run/pgpool-II-91/pgpool.pid))
>
> I used the following parameters to created the resource
>
> crm configure primitive pgPool ocf:heartbeat:pgpool2 \
> params pcp_admin_username=postgres \
> params pcp_admin_password=password \
> params pcp_admin_port=9898 \
> params pcp_admin_host=localhost \
> params pgpool_bin=/usr/bin/pgpool \
> params pcp_attach_node_bin=/usr/bin/pcp_attach_node \
> params pcp_detach_node_bin=/usr/bin/pcp_detach_node \
> params pcp_node_count_bin=/usr/bin/pcp_node_count \
> params pcp_node_info_bin=/usr/bin/pcp_node_info \
> params stop_mode=f \
> params auto_reconnect=t \
> params fail_on_detached=true \
> op monitor interval=1min
>
> The resource is looks to be created correctly but when I (re)start the
> corosync service and look at crm_mon I see some failed actions
>
> ============
> Last updated: Tue Apr 24 08:31:08 2012
> Last change: Tue Apr 24 08:02:31 2012 via cibadmin on pg1.stage.arin.net
> Stack: openais
> Current DC: pg2.stage.arin.net - partition with quorum
> Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ pg1.stage.net pg2.stage.net ]
>
> ClusterIP (ocf::heartbeat:IPaddr2):
> Started pg2.stage.net
>
> Failed actions:
>     pgPool_monitor_0 (node=pg1.stage.net, call=3, rc=2, status=complete):
> invalid parameter
>     pgPool_monitor_0 (node=pg2.stage.net, call=3, rc=2, status=complete):
> invalid parameter
>
> When I look in the /var/log/cluster/corosync.log I see this error Apr 24
> 08:23:48 pg1.stage.net lrmd: [28471]: WARN: Managed pgPool:monitor process
> 28484 exited with return code 2
>
> Has anyone ran into a similar experience or have suggestions for a cluster
> solution with pgpool-II.
>
> v/r
>
> STEVE
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>