[ClusterLabs] Antw: ocf:heartbeat:pgsql not starting
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Aug 12 07:19:51 UTC 2016
Two tips:
1) Did you stop the configured postgres in the cluster and put it into maintenance mode while tyring OCF-tester?
2) When testing my RAs I replace "'!/bin/sh" with "#!/bin/sh -x" temporarily. It produces a lot of output, but sometimes you'll find the problem.
Regards,
Ulrich
>>> Darren Kinley <dkinley at mdacorporation.com> schrieb am 11.08.2016 um 23:44 in
Nachricht <0C9F39FD10C20E49BDFE9C5B09C5E7D83F955EDE at exbermd01.ds.mda.ca>:
> Hi,
>
> I have PostgreSQL 9.3 replicated and I'm trying to put it under Pacemaker
> control
> using ocf:heartbeat:pgsql provided by SLES12SP1.
>
> This is the crmsh script that I used to configure Pacemaker.
>
> configure cib new pgsql_cfg --force
> configure primitive res-ars-pgsql ocf:heartbeat:pgsql \
> pgctl="/usr/lib/postgresql93/bin/pg_ctl" \
> psql="/usr/lib/postgresql93/bin/psql" \
> pgdata="/var/lib/pgsql/data/" \
> rep_mode="sync" \
> node_list="ars1 ars2" \
> restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \
> primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
> keepalives_count=5" \
> master_ip="192.168.244.223" \
> restart_on_promote='true' \
> pghost="191.168.244.223" \
> repuser="postgres" \
> check_wal_receiver='true' \
> monitor_user='postgres' \
> monitor_password='xxx' \
> op start timeout="120s" interval="0s" on-fail="restart" \
> op monitor timeout="120s" interval="4s" on-fail="restart" \
> op monitor timeout="120s" interval="3s" on-fail="restart"
> role="Master" \
> op promote timeout="120s" interval="0s" on-fail="restart" \
> op demote timeout="120s" interval="0s" on-fail="stop" \
> op stop timeout="120s" interval="0s" on-fail="block" \
> op notify timeout="90s" interval="0s"
> configure ms ms-ars-pgsql res-ars-pgsql \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> configure colocation col-ars-pgsql-with-drbd inf: ms-ars-pgsql:Master
> ms-ars-drbd:Master
> configure cib commit pgsql_cfg
>
> I have a ~postgres/.pgpass
>
>
> My nodes remain stopped and only once during the 12 hours I've been working
> on this
> did both nodes try to bring up PG (both in recovery mode) before shutting
> them both down.
>
> When running ocf-tester I think that I'm to name the master/slave resource.
>
> ars2:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n ms-ars-pgsql
> `pwd`/pgsql
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
> Testing permissions with uid nobody
> Testing: meta-data
> Testing: meta-data
> ...
> <XML removed/>
> ...
> Testing: validate-all
> Checking current state
> Testing: stop
> INFO: waiting for server to shut down.... done server stopped
> INFO: PostgreSQL is down
> Testing: monitor
> INFO: PostgreSQL is down
> Testing: monitor
> ocf-exit-reason:Setup problem: couldn't find command: /usr/bin/pg_ctl
> Testing: start
> INFO: server starting
> INFO: PostgreSQL start command sent.
> INFO: PostgreSQL is started.
> Testing: monitor
> Testing: monitor
> INFO: Don't check /var/lib/pgsql/data during probe
> Testing: notify
> Checking for demote action
> ocf-exit-reason:Not in a replication mode.
> Checking for promote action
> ocf-exit-reason:Not in a replication mode.
> Testing: demotion of started resource
> ocf-exit-reason:Not in a replication mode.
> * rc=6: Demoting a start resource should not fail
> Testing: promote
> ocf-exit-reason:Not in a replication mode.
> * rc=6: Promote failed
> Testing: demote
> ocf-exit-reason:Not in a replication mode.
> * rc=6: Demote failed
> Aborting tests
>
>
> 'Not in a replication mode' disagrees with the res-ars-pgsql above.
> I'm not sure that the pacemaker.log for CIB changes is needed.
>
> Aug 11 09:19:53 [2757] ars2 pengine: info: clone_print:
> Master/Slave Set: ms-ars-pgsql [res-ars-pgsql]
> Aug 11 09:19:53 [2757] ars2 pengine: info: short_print:
> Stopped: [ ars1 ars2 ]
> Aug 11 09:19:53 [2757] ars2 pengine: info:
> get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars1
> Aug 11 09:19:53 [2757] ars2 pengine: warning:
> common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000
> failures (max=1000000)
> Aug 11 09:19:53 [2757] ars2 pengine: info:
> get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars1
> Aug 11 09:19:53 [2757] ars2 pengine: warning:
> common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000
> failures (max=1000000)
> Aug 11 09:19:53 [2757] ars2 pengine: info:
> get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars2
> Aug 11 09:19:53 [2757] ars2 pengine: warning:
> common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000
> failures (max=1000000)
> Aug 11 09:19:53 [2757] ars2 pengine: info:
> get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars2
> Aug 11 09:19:53 [2757] ars2 pengine: warning:
> common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000
> failures (max=1000000)
> Aug 11 09:19:53 [2757] ars2 pengine: info: rsc_merge_weights:
> ms-ars-drbd: Rolling back scores from ms-ars-pgsql
> Aug 11 09:19:53 [2757] ars2 pengine: info: master_color:
> Promoting res-ars-drbd:1 (Master ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: master_color:
> ms-ars-drbd: Promoted 1 instances of a possible 1 to master
> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
> res-ars-pgsql:0: Rolling back scores from ms-ars-drbd
> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
> Resource res-ars-pgsql:0 cannot run anywhere
> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
> res-ars-pgsql:1: Rolling back scores from ms-ars-drbd
> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
> Resource res-ars-pgsql:1 cannot run anywhere
> Aug 11 09:19:53 [2757] ars2 pengine: info: master_color:
> ms-ars-pgsql: Promoted 0 instances of a possible 1 to master
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-mgmt-vip (Started ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-mgmt-app (Started ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-vip (Started ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-drbd:0 (Slave ars1)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-drbd:1 (Master ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-lvm (Started ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-fs_dropbox (Started ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-fs_svndata (Started ars2)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-pgsql:0 (Stopped)
> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions:
> Leave res-ars-pgsql:1 (Stopped)
> Aug 11 09:19:53 [2758] ars2 crmd: info:
> do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Aug 11 09:19:53 [2758] ars2 crmd: notice: do_te_invoke:
> Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from
> /var/lib/pacemaker/pengine/pe-input-625.bz2
>
> and /var/log/messages
>
> 2016-08-11T09:19:53.146603-07:00 ars-2 crmd[2758]: notice: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
> origin=crm_timer_popped ]
> 2016-08-11T09:19:53.152322-07:00 ars-2 pengine[2757]: notice: On loss
> of CCM Quorum: Ignore
> 2016-08-11T09:19:53.153078-07:00 ars-2 pengine[2757]: warning: Forcing
> ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
> 2016-08-11T09:19:53.153266-07:00 ars-2 pengine[2757]: warning: Forcing
> ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
> 2016-08-11T09:19:53.153395-07:00 ars-2 pengine[2757]: warning: Forcing
> ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
> 2016-08-11T09:19:53.153547-07:00 ars-2 pengine[2757]: warning: Forcing
> ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
> 2016-08-11T09:19:53.155568-07:00 ars-2 crmd[2758]: notice: Processing
> graph 222 (ref=pe_calc-dc-1470932393-1349) derived from
> /var/lib/pacemaker/pengine/pe-input-625.bz2
> 2016-08-11T09:19:53.155768-07:00 ars-2 pengine[2757]: notice:
> Calculated Transition 222: /var/lib/pacemaker/pengine/pe-input-625.bz2
> 2016-08-11T09:19:53.155927-07:00 ars-2 crmd[2758]: notice: Transition
> 222 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-625.bz2): Complete
> 2016-08-11T09:19:53.156085-07:00 ars-2 crmd[2758]: notice: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
>
>
> Can anyone provide thoughs on how to debug this?
> Should I give up with the SLES provided RA and use PAF instead?
>
> Thanks,
> Darren
More information about the Users
mailing list