[ClusterLabs] Postgres 9.3, corosync and pacemaker

Wynand Jansen van Vuuren esawyja at gmail.com
Wed Mar 11 10:33:52 UTC 2015


Hi Michael,
There is no info in the pg_logs, nothing in /var/log/cluster/corosync.log,
do you have to specify the logging somewhere, sorry I'm out of ideas
Thanks

On Wed, Mar 11, 2015 at 12:20 PM, Michael Schwartzkopff <ms at sys4.de> wrote:

> Am Mittwoch, 11. März 2015, 12:14:02 schrieb Wynand Jansen van Vuuren:
> > Hi all,
> > I need some advice and help please, I'm pretty new to HA and what I have
> is
> > 2 nodes, with 2 network interfaces, ETH0 and ETH1
> > cl1_lb1 IP is 172.28.200.166
> > cl2_lb1 IP is 172.28.200.168
> > Running on ETH0
> >
> > The heartbeart is setup on 172.16.0.1 and 172.16.0.3 and running on ETH1,
> > my pubic VIP for the 2 nodes is 172.28.200.159 and my VIP for replication
> > is 172.26.0.5
> >
> > When I test pgsql with ocf-tester (I have modified the pgsql file with
> the
> > paths etc), I get the following
> >
> > cl1_lb1:~ # cd /usr/lib/ocf/resource.d/heartbeat/
> > cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n resourcename
> > pgsql
> > Beginning tests for pgsql...
> > Testing permissions with uid nobody
> > Testing: meta-data
> > Testing: meta-data
> > <?xml version="1.0"?>
> > <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> > <resource-agent name="pgsql">
> > <version>1.0</version>
> >
> > <longdesc lang="en">
> > Resource script for PostgreSQL. It manages a PostgreSQL as an HA
> resource.
> > </longdesc>
> > <shortdesc lang="en">Manages a PostgreSQL database instance</shortdesc>
> >
> > <parameters>
> > <parameter name="pgctl" unique="0" required="0">
> > <longdesc lang="en">
> > Path to pg_ctl command.
> > </longdesc>
> > <shortdesc lang="en">pgctl</shortdesc>
> > <content type="string" default="/opt/app/PostgreSQL/9.3/bin/pg_ctl" />
> > </parameter>
> >
> > <parameter name="start_opt" unique="0" required="0">
> > <longdesc lang="en">
> > Start options (-o start_opt in pg_ctl). "-i -p 5432" for example.
> > </longdesc>
> > <shortdesc lang="en">start_opt</shortdesc>
> > <content type="string" default="" />
> >
> > </parameter>
> > <parameter name="ctl_opt" unique="0" required="0">
> > <longdesc lang="en">
> > Additional pg_ctl options (-w, -W etc..).
> > </longdesc>
> > <shortdesc lang="en">ctl_opt</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="psql" unique="0" required="0">
> > <longdesc lang="en">
> > Path to psql command.
> > </longdesc>
> > <shortdesc lang="en">psql</shortdesc>
> > <content type="string" default="/opt/app/PostgreSQL/9.3/bin/psql" />
> > </parameter>
> >
> > <parameter name="pgdata" unique="0" required="0">
> > <longdesc lang="en">
> > Path to PostgreSQL data directory.
> > </longdesc>
> > <shortdesc lang="en">pgdata</shortdesc>
> > <content type="string" default="/opt/app/pgdata/9.3" />
> > </parameter>
> >
> > <parameter name="pgdba" unique="0" required="0">
> > <longdesc lang="en">
> > User that owns PostgreSQL.
> > </longdesc>
> > <shortdesc lang="en">pgdba</shortdesc>
> > <content type="string" default="postgres" />
> > </parameter>
> >
> > <parameter name="pghost" unique="0" required="0">
> > <longdesc lang="en">
> > Hostname/IP address where PostgreSQL is listening
> > </longdesc>
> > <shortdesc lang="en">pghost</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="pgport" unique="0" required="0">
> > <longdesc lang="en">
> > Port where PostgreSQL is listening
> > </longdesc>
> > <shortdesc lang="en">pgport</shortdesc>
> > <content type="integer" default="5432" />
> > </parameter>
> >
> > <parameter name="pglibs" unique="0" required="0">
> > <longdesc lang="en">
> > Custom location of the Postgres libraries. If not set, the standard
> location
> > will be used.
> > </longdesc>
> > <shortdesc lang="en">pglibs</shortdesc>
> > <content type="string" default="/opt/app/PostgreSQL/9.3/lib" />
> > </parameter>
> >
> > <parameter name="monitor_user" unique="0" required="0">
> > <longdesc lang="en">
> > PostgreSQL user that pgsql RA will user for monitor operations. If it's
> not
> > set
> > pgdba user will be used.
> > </longdesc>
> > <shortdesc lang="en">monitor_user</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="monitor_password" unique="0" required="0">
> > <longdesc lang="en">
> > Password for monitor user.
> > </longdesc>
> > <shortdesc lang="en">monitor_password</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="monitor_sql" unique="0" required="0">
> > <longdesc lang="en">
> > SQL script that will be used for monitor operations.
> > </longdesc>
> > <shortdesc lang="en">monitor_sql</shortdesc>
> > <content type="string" default="select now();" />
> > </parameter>
> >
> > <parameter name="config" unique="0" required="0">
> > <longdesc lang="en">
> > Path to the PostgreSQL configuration file for the instance.
> > </longdesc>
> > <shortdesc lang="en">Configuration file</shortdesc>
> > <content type="string" default="/opt/app/pgdata/9.3/postgresql.conf" />
> > </parameter>
> >
> > <parameter name="pgdb" unique="0" required="0">
> > <longdesc lang="en">
> > Database that will be used for monitoring.
> > </longdesc>
> > <shortdesc lang="en">pgdb</shortdesc>
> > <content type="string" default="template1" />
> > </parameter>
> >
> > <parameter name="logfile" unique="0" required="0">
> > <longdesc lang="en">
> > Path to PostgreSQL server log output file.
> > </longdesc>
> > <shortdesc lang="en">logfile</shortdesc>
> > <content type="string" default="/dev/null" />
> > </parameter>
> >
> > <parameter name="socketdir" unique="0" required="0">
> > <longdesc lang="en">
> > Unix socket directory for PostgreSQL.
> >
> > If you use PostgreSQL 9.3 or higher and define unix_socket_directories in
> > the postgresql.conf, then you must set socketdir to determine which
> > directory is used for psql command.
> > </longdesc>
> > <shortdesc lang="en">socketdir</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="stop_escalate" unique="0" required="0">
> > <longdesc lang="en">
> > Number of shutdown retries (using -m fast) before resorting to -m
> immediate
> > </longdesc>
> > <shortdesc lang="en">stop escalation</shortdesc>
> > <content type="integer" default="30" />
> > </parameter>
> >
> > <parameter name="rep_mode" unique="0" required="0">
> > <longdesc lang="en">
> > Replication mode may be set to "async" or "sync" or "slave".
> > They require PostgreSQL 9.1 or later.
> > Once set, "async" and "sync" require node_list, master_ip, and
> > restore_command parameters,as well as configuring PostgreSQL
> > for replication (in postgresql.conf and pg_hba.conf).
> >
> > "slave" means that RA only makes recovery.conf before starting
> > to connect to primary which is running somewhere.
> > It dosen't need master/slave setting.
> > It requires master_ip restore_command parameters.
> > </longdesc>
> > <shortdesc lang="en">rep_mode</shortdesc>
> > <content type="string" default="none" />
> > </parameter>
> >
> > <parameter name="node_list" unique="0" required="0">
> > <longdesc lang="en">
> > All node names. Please separate each node name with a space.
> > This is required for replication.
> > </longdesc>
> > <shortdesc lang="en">node list</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="restore_command" unique="0" required="0">
> > <longdesc lang="en">
> > restore_command for recovery.conf.
> > This is required for replication.
> > </longdesc>
> > <shortdesc lang="en">restore_command</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="archive_cleanup_command" unique="0" required="0">
> > <longdesc lang="en">
> > archive_cleanup_command for recovery.conf.
> > This is used for replication and is optional.
> > </longdesc>
> > <shortdesc lang="en">archive_cleanup_command</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="recovery_end_command" unique="0" required="0">
> > <longdesc lang="en">
> > recovery_end_command for recovery.conf.
> > This is used for replication and is optional.
> > </longdesc>
> > <shortdesc lang="en">recovery_end_command</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="master_ip" unique="0" required="0">
> > <longdesc lang="en">
> > Master's floating IP address to be connected from hot standby.
> > This parameter is used for "primary_conninfo" in recovery.conf.
> > This is required for replication.
> > </longdesc>
> > <shortdesc lang="en">master ip</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="repuser" unique="0" required="0">
> > <longdesc lang="en">
> > User used to connect to the master server.
> > This parameter is used for "primary_conninfo" in recovery.conf.
> > This is required for replication.
> > </longdesc>
> > <shortdesc lang="en">repuser</shortdesc>
> > <content type="string" default="postgres" />
> > </parameter>
> >
> > <parameter name="primary_conninfo_opt" unique="0" required="0">
> > <longdesc lang="en">
> > primary_conninfo options of recovery.conf except host, port, user and
> > application_name.
> > This is optional for replication.
> > </longdesc>
> > <shortdesc lang="en">primary_conninfo_opt</shortdesc>
> > <content type="string" default="" />
> > </parameter>
> >
> > <parameter name="restart_on_promote" unique="0" required="0">
> > <longdesc lang="en">
> > If this is true, RA deletes recovery.conf and restarts PostgreSQL
> > on promote to keep Timeline ID. It probably makes fail-over slower.
> > It's recommended to set on-fail of promote up as fence.
> > This is optional for replication.
> > </longdesc>
> > <shortdesc lang="en">restart_on_promote</shortdesc>
> > <content type="boolean" default="false" />
> > </parameter>
> >
> > <parameter name="tmpdir" unique="0" required="0">
> > <longdesc lang="en">
> > Path to temporary directory.
> > This is optional for replication.
> > </longdesc>
> > <shortdesc lang="en">tmpdir</shortdesc>
> > <content type="string" default="/var/lib/pgsql/tmp" />
> > </parameter>
> >
> > <parameter name="xlog_check_count" unique="0" required="0">
> > <longdesc lang="en">
> > Number of checks of xlog on monitor before promote.
> > This is optional for replication.
> > </longdesc>
> > <shortdesc lang="en">xlog check count</shortdesc>
> > <content type="integer" default="" />
> > </parameter>
> >
> > <parameter name="crm_attr_timeout" unique="0" required="0">
> > <longdesc lang="en">
> > The timeout of crm_attribute forever update command.
> > Default value is 5 seconds.
> > This is optional for replication.
> > </longdesc>
> > <shortdesc lang="en">The timeout of crm_attribute forever update
> > command.</shortdesc>
> > <content type="integer" default="5" />
> > </parameter>
> >
> > <parameter name="stop_escalate_in_slave" unique="0" required="0">
> > <longdesc lang="en">
> > Number of shutdown retries (using -m fast) before resorting to -m
> immediate
> > in slave state.
> > This is optional for replication.
> > </longdesc>
> > <shortdesc lang="en">stop escalation_in_slave</shortdesc>
> > <content type="integer" default="30" />
> > </parameter>
> >
> > <parameter name="check_wal_receiver" unique="0" required="0">
> > <longdesc lang="en">
> > If this is true, RA checks wal_receiver process on monitor
> > and notifies its status using "(resource name)-receiver-status"
> attribute.
> > It's useful for checking whether PostgreSQL (hot standby) connects to
> > primary.
> > The attribute shows status as "normal" or "ERROR".
> > </longdesc>
> > <shortdesc lang="en">check_wal_receiver</shortdesc>
> > <content type="boolean" default="false" />
> > </parameter>
> > </parameters>
> >
> > <actions>
> > <action name="start" timeout="120" />
> > <action name="stop" timeout="120" />
> > <action name="status" timeout="60" />
> > <action name="monitor" depth="0" timeout="30" interval="30"/>
> > <action name="monitor" depth="0" timeout="30" interval="29"
> role="Master" />
> > <action name="promote" timeout="120" />
> > <action name="demote" timeout="120" />
> > <action name="notify"   timeout="90" />
> > <action name="meta-data" timeout="5" />
> > <action name="validate-all" timeout="5" />
> > <action name="methods" timeout="5" />
> > </actions>
> > </resource-agent>
> > Testing: validate-all
> > Checking current state
> > Testing: stop
> > INFO: waiting for server to shut down..... done server stopped
> > INFO: PostgreSQL is down
> > Testing: monitor
> > INFO: PostgreSQL is down
> > Testing: monitor
> > ocf-exit-reason:Setup problem: couldn't find command:
> > /opt/app/PostgreSQL/9.3/bin/pg_ctl
> > Testing: start
> > INFO: server starting
> > INFO: PostgreSQL start command sent.
> > INFO: PostgreSQL is down
> > DEBUG: PostgreSQL still hasn't started yet. Waiting...
> > INFO: PostgreSQL is started.
> > Testing: monitor
> > Testing: monitor
> > INFO: Don't check /opt/app/pgdata/9.3 during probe
> > Testing: notify
> > Checking for demote action
> > ocf-exit-reason:Not in a replication mode.
> > Checking for promote action
> > ocf-exit-reason:Not in a replication mode.
> > Testing: demotion of started resource
> > ocf-exit-reason:Not in a replication mode.
> > * rc=6: Demoting a start resource should not fail
> > Testing: promote
> > ocf-exit-reason:Not in a replication mode.
> > * rc=6: Promote failed
> > Testing: demote
> > ocf-exit-reason:Not in a replication mode.
> > * rc=6: Demote failed
> > Aborting tests
> > cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # su - postgres
> > postgres at cl1_lb1:~> pg_ctl -D /opt/app/pgdata/9.3 status
> > pg_ctl: server is running (PID: 21575)
> > /opt/app/PostgreSQL/9.3/bin/postgres "-D" "/opt/app/pgdata/9.3" "-c"
> > "config_file=/opt/app/pgdata/9.3/postgresql.conf"
> >
> > Postgres seem to be running ok on both nodes and the streaming is taking
> > place
> >
> > postgres at cl1_lb1:~> exit
> > logout
> > cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # crm_mon -1 -Af
> > Last updated: Wed Mar 11 12:05:17 2015
> > Last change: Wed Mar 11 12:03:10 2015 by root via crm_attribute on
> cl1_lb1
> > Stack: classic openais (with plugin)
> > Current DC: cl1_lb1 - partition with quorum
> > Version: 1.1.9-2db99f1
> > 2 Nodes configured, 2 expected votes
> > 4 Resources configured.
> >
> >
> > Online: [ cl1_lb1 cl2_lb1 ]
> >
> >  Master/Slave Set: msPostgresql [pgsql]
> >      pgsql:0    (ocf::heartbeat:pgsql):    Slave cl1_lb1 FAILED
> >      Stopped: [ pgsql:1 ]
> >
> > Node Attributes:
> > * Node cl1_lb1:
> >     + master-pgsql                        : -INFINITY
> >     + pgsql-data-status                   : LATEST
> >     + pgsql-status                        : STOP
> > * Node cl2_lb1:
> >     + master-pgsql                        : 100
> >     + pgsql-data-status                   : STREAMING|SYNC
> >     + pgsql-status                        : STOP
> >
> > Migration summary:
> > * Node cl1_lb1:
> >    pgsql:0: migration-threshold=1 fail-count=1 last-failure='Wed Mar 11
> > 12:04:42 2015'
> > * Node cl2_lb1:
> >    pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Wed
> Mar
> > 11 12:01:40 2015'
> >
> > Failed actions:
> >     pgsql_monitor_3000 (node=cl1_lb1, call=132, rc=1, status=complete):
> > unknown error
> >     pgsql_start_0 (node=cl2_lb1, call=90, rc=1, status=complete): unknown
> > error
> > cl1_lb1:/usr/lib/ocf/resource.d/heartbeat # crm configure show
> > node cl1_lb1 \
> >     attributes pgsql-data-status="LATEST"
> > node cl2_lb1 \
> >     attributes pgsql-data-status="STREAMING|SYNC"
> > primitive pgsql ocf:heartbeat:pgsql \
> >     params pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
> > psql="/opt/app/PostgreSQL/9.3/bin/psql"
> > config="/opt/app/pgdata/9.3/postgresql.conf" pgdba="postgres"
> > pgdata="/opt/app/pgdata/9.3/" start_opt="-p 5432" rep_mode="sync"
> > node_list="cl1_lb1 cl2_lb1" restore_command="cp /pgtablespace/archive/%f
> > %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
> > keepalives_count=5" master_ip="172.16.0.5" restart_on_promote="true"
> > logfile="/var/log/OCF.log" \
> >     op start interval="0s" timeout="60s" on-fail="restart" \
> >     op monitor interval="4s" timeout="60s" on-fail="restart" \
> >     op monitor interval="3s" role="Master" timeout="60s"
> on-fail="restart" \
> > op promote interval="0s" timeout="60s" on-fail="restart" \
> >     op demote interval="0s" timeout="60s" on-fail="stop" \
> >     op stop interval="0s" timeout="60s" on-fail="block" \
> >     op notify interval="0s" timeout="60s"
> > primitive vip-master ocf:heartbeat:IPaddr2 \
> >     params ip="172.28.200.159" nic="eth0" cidr_netmask="24" \
> >     op start interval="0s" timeout="60s" on-fail="stop" \
> >     op monitor interval="10s" timeout="60s" on-fail="restart" \
> >     op stop interval="0s" timeout="60s" on-fail="block"
> > primitive vip-rep ocf:heartbeat:IPaddr2 \
> >     params ip="172.16.0.5" nic="eth1" cidr_netmask="24" \
> >     meta migration-threshold="0" \
> >     op start interval="0s" timeout="60s" on-fail="restart" \
> >     op monitor interval="10s" timeout="60s" on-fail="restart" \
> >     op stop interval="0s" timeout="60s" on-fail="block"
> > group master-group vip-master vip-rep
> > ms msPostgresql pgsql \
> >     meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true"
> > colocation rsc_colocation-1 inf: master-group msPostgresql:Master
> > order rsc_order-1 0: msPostgresql:promote master-group:start
> > symmetrical=false
> > order rsc_order-2 0: msPostgresql:demote master-group:stop
> symmetrical=false
> > property $id="cib-bootstrap-options" \
> >     dc-version="1.1.9-2db99f1" \
> >     cluster-infrastructure="classic openais (with plugin)" \
> >     expected-quorum-votes="2" \
> >     no-quorum-policy="ignore" \
> >     stonith-enabled="false" \
> >     crmd-transition-delay="0s" \
> >     last-lrm-refresh="1426067955"
> > rsc_defaults $id="rsc-options" \
> >     resource-stickiness="INFINITY" \
> >     migration-threshold="1"
> > cl1_lb1:/usr/lib/ocf/resource.d/heartbeat #
> >
> > It seems that pgsql is not playing nicely with my Postgres install, any
> > hints tips what to look for please??
> > Thanks
>
>
> Look into the logs. The pgsql agent writes errors to the logs.
>
> Search in your logs on cl2_lb1 around  12:01:40 2015.
>
>
> Mit freundlichen Grüßen,
>
> Michael Schwartzkopff
>
> --
> [*] sys4 AG
>
> http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
> Franziskanerstraße 15, 81669 München
>
> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
> Vorstand: Patrick Ben Koetter, Marc Schiffbauer
> Aufsichtsratsvorsitzender: Florian Kirstein
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150311/a921181c/attachment-0002.html>


More information about the Users mailing list