[ClusterLabs] Postgres 9.3, corosync and pacemaker
Wynand Jansen van Vuuren
esawyja at gmail.com
Wed Mar 11 10:14:02 UTC 2015
Hi all,
I need some advice and help please, I'm pretty new to HA and what I have is
2 nodes, with 2 network interfaces, ETH0 and ETH1
cl1_lb1 IP is 172.28.200.166
cl2_lb1 IP is 172.28.200.168
Running on ETH0
The heartbeart is setup on 172.16.0.1 and 172.16.0.3 and running on ETH1,
my pubic VIP for the 2 nodes is 172.28.200.159 and my VIP for replication
is 172.26.0.5
When I test pgsql with ocf-tester (I have modified the pgsql file with the
paths etc), I get the following
cl1_lb1:~ # cd /usr/lib/ocf/resource.d/heartbeat/
cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n resourcename
pgsql
Beginning tests for pgsql...
Testing permissions with uid nobody
Testing: meta-data
Testing: meta-data
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="pgsql">
<version>1.0</version>
<longdesc lang="en">
Resource script for PostgreSQL. It manages a PostgreSQL as an HA resource.
</longdesc>
<shortdesc lang="en">Manages a PostgreSQL database instance</shortdesc>
<parameters>
<parameter name="pgctl" unique="0" required="0">
<longdesc lang="en">
Path to pg_ctl command.
</longdesc>
<shortdesc lang="en">pgctl</shortdesc>
<content type="string" default="/opt/app/PostgreSQL/9.3/bin/pg_ctl" />
</parameter>
<parameter name="start_opt" unique="0" required="0">
<longdesc lang="en">
Start options (-o start_opt in pg_ctl). "-i -p 5432" for example.
</longdesc>
<shortdesc lang="en">start_opt</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="ctl_opt" unique="0" required="0">
<longdesc lang="en">
Additional pg_ctl options (-w, -W etc..).
</longdesc>
<shortdesc lang="en">ctl_opt</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="psql" unique="0" required="0">
<longdesc lang="en">
Path to psql command.
</longdesc>
<shortdesc lang="en">psql</shortdesc>
<content type="string" default="/opt/app/PostgreSQL/9.3/bin/psql" />
</parameter>
<parameter name="pgdata" unique="0" required="0">
<longdesc lang="en">
Path to PostgreSQL data directory.
</longdesc>
<shortdesc lang="en">pgdata</shortdesc>
<content type="string" default="/opt/app/pgdata/9.3" />
</parameter>
<parameter name="pgdba" unique="0" required="0">
<longdesc lang="en">
User that owns PostgreSQL.
</longdesc>
<shortdesc lang="en">pgdba</shortdesc>
<content type="string" default="postgres" />
</parameter>
<parameter name="pghost" unique="0" required="0">
<longdesc lang="en">
Hostname/IP address where PostgreSQL is listening
</longdesc>
<shortdesc lang="en">pghost</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="pgport" unique="0" required="0">
<longdesc lang="en">
Port where PostgreSQL is listening
</longdesc>
<shortdesc lang="en">pgport</shortdesc>
<content type="integer" default="5432" />
</parameter>
<parameter name="pglibs" unique="0" required="0">
<longdesc lang="en">
Custom location of the Postgres libraries. If not set, the standard location
will be used.
</longdesc>
<shortdesc lang="en">pglibs</shortdesc>
<content type="string" default="/opt/app/PostgreSQL/9.3/lib" />
</parameter>
<parameter name="monitor_user" unique="0" required="0">
<longdesc lang="en">
PostgreSQL user that pgsql RA will user for monitor operations. If it's not
set
pgdba user will be used.
</longdesc>
<shortdesc lang="en">monitor_user</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="monitor_password" unique="0" required="0">
<longdesc lang="en">
Password for monitor user.
</longdesc>
<shortdesc lang="en">monitor_password</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="monitor_sql" unique="0" required="0">
<longdesc lang="en">
SQL script that will be used for monitor operations.
</longdesc>
<shortdesc lang="en">monitor_sql</shortdesc>
<content type="string" default="select now();" />
</parameter>
<parameter name="config" unique="0" required="0">
<longdesc lang="en">
Path to the PostgreSQL configuration file for the instance.
</longdesc>
<shortdesc lang="en">Configuration file</shortdesc>
<content type="string" default="/opt/app/pgdata/9.3/postgresql.conf" />
</parameter>
<parameter name="pgdb" unique="0" required="0">
<longdesc lang="en">
Database that will be used for monitoring.
</longdesc>
<shortdesc lang="en">pgdb</shortdesc>
<content type="string" default="template1" />
</parameter>
<parameter name="logfile" unique="0" required="0">
<longdesc lang="en">
Path to PostgreSQL server log output file.
</longdesc>
<shortdesc lang="en">logfile</shortdesc>
<content type="string" default="/dev/null" />
</parameter>
<parameter name="socketdir" unique="0" required="0">
<longdesc lang="en">
Unix socket directory for PostgreSQL.
If you use PostgreSQL 9.3 or higher and define unix_socket_directories in
the postgresql.conf, then you must set socketdir to determine which
directory is used for psql command.
</longdesc>
<shortdesc lang="en">socketdir</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="stop_escalate" unique="0" required="0">
<longdesc lang="en">
Number of shutdown retries (using -m fast) before resorting to -m immediate
</longdesc>
<shortdesc lang="en">stop escalation</shortdesc>
<content type="integer" default="30" />
</parameter>
<parameter name="rep_mode" unique="0" required="0">
<longdesc lang="en">
Replication mode may be set to "async" or "sync" or "slave".
They require PostgreSQL 9.1 or later.
Once set, "async" and "sync" require node_list, master_ip, and
restore_command parameters,as well as configuring PostgreSQL
for replication (in postgresql.conf and pg_hba.conf).
"slave" means that RA only makes recovery.conf before starting
to connect to primary which is running somewhere.
It dosen't need master/slave setting.
It requires master_ip restore_command parameters.
</longdesc>
<shortdesc lang="en">rep_mode</shortdesc>
<content type="string" default="none" />
</parameter>
<parameter name="node_list" unique="0" required="0">
<longdesc lang="en">
All node names. Please separate each node name with a space.
This is required for replication.
</longdesc>
<shortdesc lang="en">node list</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="restore_command" unique="0" required="0">
<longdesc lang="en">
restore_command for recovery.conf.
This is required for replication.
</longdesc>
<shortdesc lang="en">restore_command</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="archive_cleanup_command" unique="0" required="0">
<longdesc lang="en">
archive_cleanup_command for recovery.conf.
This is used for replication and is optional.
</longdesc>
<shortdesc lang="en">archive_cleanup_command</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="recovery_end_command" unique="0" required="0">
<longdesc lang="en">
recovery_end_command for recovery.conf.
This is used for replication and is optional.
</longdesc>
<shortdesc lang="en">recovery_end_command</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="master_ip" unique="0" required="0">
<longdesc lang="en">
Master's floating IP address to be connected from hot standby.
This parameter is used for "primary_conninfo" in recovery.conf.
This is required for replication.
</longdesc>
<shortdesc lang="en">master ip</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="repuser" unique="0" required="0">
<longdesc lang="en">
User used to connect to the master server.
This parameter is used for "primary_conninfo" in recovery.conf.
This is required for replication.
</longdesc>
<shortdesc lang="en">repuser</shortdesc>
<content type="string" default="postgres" />
</parameter>
<parameter name="primary_conninfo_opt" unique="0" required="0">
<longdesc lang="en">
primary_conninfo options of recovery.conf except host, port, user and
application_name.
This is optional for replication.
</longdesc>
<shortdesc lang="en">primary_conninfo_opt</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="restart_on_promote" unique="0" required="0">
<longdesc lang="en">
If this is true, RA deletes recovery.conf and restarts PostgreSQL
on promote to keep Timeline ID. It probably makes fail-over slower.
It's recommended to set on-fail of promote up as fence.
This is optional for replication.
</longdesc>
<shortdesc lang="en">restart_on_promote</shortdesc>
<content type="boolean" default="false" />
</parameter>
<parameter name="tmpdir" unique="0" required="0">
<longdesc lang="en">
Path to temporary directory.
This is optional for replication.
</longdesc>
<shortdesc lang="en">tmpdir</shortdesc>
<content type="string" default="/var/lib/pgsql/tmp" />
</parameter>
<parameter name="xlog_check_count" unique="0" required="0">
<longdesc lang="en">
Number of checks of xlog on monitor before promote.
This is optional for replication.
</longdesc>
<shortdesc lang="en">xlog check count</shortdesc>
<content type="integer" default="" />
</parameter>
<parameter name="crm_attr_timeout" unique="0" required="0">
<longdesc lang="en">
The timeout of crm_attribute forever update command.
Default value is 5 seconds.
This is optional for replication.
</longdesc>
<shortdesc lang="en">The timeout of crm_attribute forever update
command.</shortdesc>
<content type="integer" default="5" />
</parameter>
<parameter name="stop_escalate_in_slave" unique="0" required="0">
<longdesc lang="en">
Number of shutdown retries (using -m fast) before resorting to -m immediate
in slave state.
This is optional for replication.
</longdesc>
<shortdesc lang="en">stop escalation_in_slave</shortdesc>
<content type="integer" default="30" />
</parameter>
<parameter name="check_wal_receiver" unique="0" required="0">
<longdesc lang="en">
If this is true, RA checks wal_receiver process on monitor
and notifies its status using "(resource name)-receiver-status" attribute.
It's useful for checking whether PostgreSQL (hot standby) connects to
primary.
The attribute shows status as "normal" or "ERROR".
</longdesc>
<shortdesc lang="en">check_wal_receiver</shortdesc>
<content type="boolean" default="false" />
</parameter>
</parameters>
<actions>
<action name="start" timeout="120" />
<action name="stop" timeout="120" />
<action name="status" timeout="60" />
<action name="monitor" depth="0" timeout="30" interval="30"/>
<action name="monitor" depth="0" timeout="30" interval="29" role="Master" />
<action name="promote" timeout="120" />
<action name="demote" timeout="120" />
<action name="notify" timeout="90" />
<action name="meta-data" timeout="5" />
<action name="validate-all" timeout="5" />
<action name="methods" timeout="5" />
</actions>
</resource-agent>
Testing: validate-all
Checking current state
Testing: stop
INFO: waiting for server to shut down..... done server stopped
INFO: PostgreSQL is down
Testing: monitor
INFO: PostgreSQL is down
Testing: monitor
ocf-exit-reason:Setup problem: couldn't find command:
/opt/app/PostgreSQL/9.3/bin/pg_ctl
Testing: start
INFO: server starting
INFO: PostgreSQL start command sent.
INFO: PostgreSQL is down
DEBUG: PostgreSQL still hasn't started yet. Waiting...
INFO: PostgreSQL is started.
Testing: monitor
Testing: monitor
INFO: Don't check /opt/app/pgdata/9.3 during probe
Testing: notify
Checking for demote action
ocf-exit-reason:Not in a replication mode.
Checking for promote action
ocf-exit-reason:Not in a replication mode.
Testing: demotion of started resource
ocf-exit-reason:Not in a replication mode.
* rc=6: Demoting a start resource should not fail
Testing: promote
ocf-exit-reason:Not in a replication mode.
* rc=6: Promote failed
Testing: demote
ocf-exit-reason:Not in a replication mode.
* rc=6: Demote failed
Aborting tests
cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # su - postgres
postgres at cl1_lb1:~> pg_ctl -D /opt/app/pgdata/9.3 status
pg_ctl: server is running (PID: 21575)
/opt/app/PostgreSQL/9.3/bin/postgres "-D" "/opt/app/pgdata/9.3" "-c"
"config_file=/opt/app/pgdata/9.3/postgresql.conf"
Postgres seem to be running ok on both nodes and the streaming is taking
place
postgres at cl1_lb1:~> exit
logout
cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # crm_mon -1 -Af
Last updated: Wed Mar 11 12:05:17 2015
Last change: Wed Mar 11 12:03:10 2015 by root via crm_attribute on cl1_lb1
Stack: classic openais (with plugin)
Current DC: cl1_lb1 - partition with quorum
Version: 1.1.9-2db99f1
2 Nodes configured, 2 expected votes
4 Resources configured.
Online: [ cl1_lb1 cl2_lb1 ]
Master/Slave Set: msPostgresql [pgsql]
pgsql:0 (ocf::heartbeat:pgsql): Slave cl1_lb1 FAILED
Stopped: [ pgsql:1 ]
Node Attributes:
* Node cl1_lb1:
+ master-pgsql : -INFINITY
+ pgsql-data-status : LATEST
+ pgsql-status : STOP
* Node cl2_lb1:
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : STOP
Migration summary:
* Node cl1_lb1:
pgsql:0: migration-threshold=1 fail-count=1 last-failure='Wed Mar 11
12:04:42 2015'
* Node cl2_lb1:
pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Wed Mar
11 12:01:40 2015'
Failed actions:
pgsql_monitor_3000 (node=cl1_lb1, call=132, rc=1, status=complete):
unknown error
pgsql_start_0 (node=cl2_lb1, call=90, rc=1, status=complete): unknown
error
cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # crm configure show
node cl1_lb1 \
attributes pgsql-data-status="LATEST"
node cl2_lb1 \
attributes pgsql-data-status="STREAMING|SYNC"
primitive pgsql ocf:heartbeat:pgsql \
params pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
psql="/opt/app/PostgreSQL/9.3/bin/psql"
config="/opt/app/pgdata/9.3/postgresql.conf" pgdba="postgres"
pgdata="/opt/app/pgdata/9.3/" start_opt="-p 5432" rep_mode="sync"
node_list="cl1_lb1 cl2_lb1" restore_command="cp /pgtablespace/archive/%f
%p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
keepalives_count=5" master_ip="172.16.0.5" restart_on_promote="true"
logfile="/var/log/OCF.log" \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="4s" timeout="60s" on-fail="restart" \
op monitor interval="3s" role="Master" timeout="60s" on-fail="restart" \
op promote interval="0s" timeout="60s" on-fail="restart" \
op demote interval="0s" timeout="60s" on-fail="stop" \
op stop interval="0s" timeout="60s" on-fail="block" \
op notify interval="0s" timeout="60s"
primitive vip-master ocf:heartbeat:IPaddr2 \
params ip="172.28.200.159" nic="eth0" cidr_netmask="24" \
op start interval="0s" timeout="60s" on-fail="stop" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="block"
primitive vip-rep ocf:heartbeat:IPaddr2 \
params ip="172.16.0.5" nic="eth1" cidr_netmask="24" \
meta migration-threshold="0" \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="block"
group master-group vip-master vip-rep
ms msPostgresql pgsql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation rsc_colocation-1 inf: master-group msPostgresql:Master
order rsc_order-1 0: msPostgresql:promote master-group:start
symmetrical=false
order rsc_order-2 0: msPostgresql:demote master-group:stop symmetrical=false
property $id="cib-bootstrap-options" \
dc-version="1.1.9-2db99f1" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
crmd-transition-delay="0s" \
last-lrm-refresh="1426067955"
rsc_defaults $id="rsc-options" \
resource-stickiness="INFINITY" \
migration-threshold="1"
cl1_lb1:/usr/lib/ocf/resource.d/heartbeat #
It seems that pgsql is not playing nicely with my Postgres install, any
hints tips what to look for please??
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150311/f4e258df/attachment-0003.html>
More information about the Users
mailing list