<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<font face="Calibri" size="2"><span style="font-size:11pt;">
<div>Hi,</div>
<div> </div>
<div>I have PostgreSQL 9.3 replicated and I'm trying to put it under Pacemaker control</div>
<div>using ocf:heartbeat:pgsql provided by SLES12SP1.</div>
<div> </div>
<div>This is the crmsh script that I used to configure Pacemaker.</div>
<div> </div>
<div> configure cib new pgsql_cfg --force</div>
<div> configure primitive res-ars-pgsql ocf:heartbeat:pgsql \</div>
<div> pgctl="/usr/lib/postgresql93/bin/pg_ctl" \</div>
<div> psql="/usr/lib/postgresql93/bin/psql" \</div>
<div> pgdata="/var/lib/pgsql/data/" \</div>
<div> rep_mode="sync" \</div>
<div> node_list="ars1 ars2" \</div>
<div> restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \</div>
<div> primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \</div>
<div> master_ip="192.168.244.223" \</div>
<div> restart_on_promote='true' \</div>
<div> pghost="191.168.244.223" \</div>
<div> repuser="postgres" \</div>
<div> check_wal_receiver='true' \</div>
<div> monitor_user='postgres' \</div>
<div> monitor_password='xxx' \</div>
<div> op start timeout="120s" interval="0s" on-fail="restart" \</div>
<div> op monitor timeout="120s" interval="4s" on-fail="restart" \</div>
<div> op monitor timeout="120s" interval="3s" on-fail="restart" role="Master" \</div>
<div> op promote timeout="120s" interval="0s" on-fail="restart" \</div>
<div> op demote timeout="120s" interval="0s" on-fail="stop" \</div>
<div> op stop timeout="120s" interval="0s" on-fail="block" \</div>
<div> op notify timeout="90s" interval="0s"</div>
<div> configure ms ms-ars-pgsql res-ars-pgsql \</div>
<div> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true</div>
<div> configure colocation col-ars-pgsql-with-drbd inf: ms-ars-pgsql:Master ms-ars-drbd:Master</div>
<div> configure cib commit pgsql_cfg</div>
<div> </div>
<div>I have a ~postgres/.pgpass</div>
<div> </div>
<div> </div>
<div>My nodes remain stopped and only once during the 12 hours I've been working on this</div>
<div>did both nodes try to bring up PG (both in recovery mode) before shutting them both down.</div>
<div> </div>
<div>When running ocf-tester I think that I'm to name the master/slave resource.</div>
<div> </div>
<div> ars2:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n ms-ars-pgsql `pwd`/pgsql</div>
<div> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...</div>
<div> Testing permissions with uid nobody</div>
<div> Testing: meta-data</div>
<div> Testing: meta-data</div>
<div> ...</div>
<div> <XML removed/></div>
<div> ...</div>
<div> Testing: validate-all</div>
<div> Checking current state</div>
<div> Testing: stop</div>
<div> INFO: waiting for server to shut down.... done server stopped</div>
<div> INFO: PostgreSQL is down</div>
<div> Testing: monitor</div>
<div> INFO: PostgreSQL is down</div>
<div> Testing: monitor</div>
<div> ocf-exit-reason:Setup problem: couldn't find command: /usr/bin/pg_ctl</div>
<div> Testing: start</div>
<div> INFO: server starting</div>
<div> INFO: PostgreSQL start command sent.</div>
<div> INFO: PostgreSQL is started.</div>
<div> Testing: monitor</div>
<div> Testing: monitor</div>
<div> INFO: Don't check /var/lib/pgsql/data during probe</div>
<div> Testing: notify</div>
<div> Checking for demote action</div>
<div> ocf-exit-reason:Not in a replication mode.</div>
<div> Checking for promote action</div>
<div> ocf-exit-reason:Not in a replication mode.</div>
<div> Testing: demotion of started resource</div>
<div> ocf-exit-reason:Not in a replication mode.</div>
<div> * rc=6: Demoting a start resource should not fail</div>
<div> Testing: promote</div>
<div> ocf-exit-reason:Not in a replication mode.</div>
<div> * rc=6: Promote failed</div>
<div> Testing: demote</div>
<div> ocf-exit-reason:Not in a replication mode.</div>
<div> * rc=6: Demote failed</div>
<div> Aborting tests</div>
<div> </div>
<div> </div>
<div>'Not in a replication mode' disagrees with the res-ars-pgsql above.</div>
<div>I'm not sure that the pacemaker.log for CIB changes is needed.</div>
<div> </div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: clone_print: Master/Slave Set: ms-ars-pgsql [res-ars-pgsql]</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: short_print: Stopped: [ ars1 ars2 ]</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars1</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars1</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars2</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars2</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: warning: common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: rsc_merge_weights: ms-ars-drbd: Rolling back scores from ms-ars-pgsql</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: Promoting res-ars-drbd:1 (Master ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: ms-ars-drbd: Promoted 1 instances of a possible 1 to master</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: res-ars-pgsql:0: Rolling back scores from ms-ars-drbd</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: Resource res-ars-pgsql:0 cannot run anywhere</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: res-ars-pgsql:1: Rolling back scores from ms-ars-drbd</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: Resource res-ars-pgsql:1 cannot run anywhere</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: ms-ars-pgsql: Promoted 0 instances of a possible 1 to master</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-mgmt-vip (Started ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-mgmt-app (Started ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-vip (Started ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-drbd:0 (Slave ars1)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-drbd:1 (Master ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-lvm (Started ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-fs_dropbox (Started ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-fs_svndata (Started ars2)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-pgsql:0 (Stopped)</div>
<div> Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave res-ars-pgsql:1 (Stopped)</div>
<div> Aug 11 09:19:53 [2758] ars2 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]</div>
<div> Aug 11 09:19:53 [2758] ars2 crmd: notice: do_te_invoke: Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from /var/lib/pacemaker/pengine/pe-input-625.bz2</div>
<div> </div>
<div>and /var/log/messages</div>
<div> </div>
<div> 2016-08-11T09:19:53.146603-07:00 ars-2 crmd[2758]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]</div>
<div> 2016-08-11T09:19:53.152322-07:00 ars-2 pengine[2757]: notice: On loss of CCM Quorum: Ignore</div>
<div> 2016-08-11T09:19:53.153078-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)</div>
<div> 2016-08-11T09:19:53.153266-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)</div>
<div> 2016-08-11T09:19:53.153395-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)</div>
<div> 2016-08-11T09:19:53.153547-07:00 ars-2 pengine[2757]: warning: Forcing ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)</div>
<div> 2016-08-11T09:19:53.155568-07:00 ars-2 crmd[2758]: notice: Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from /var/lib/pacemaker/pengine/pe-input-625.bz2</div>
<div> 2016-08-11T09:19:53.155768-07:00 ars-2 pengine[2757]: notice: Calculated Transition 222: /var/lib/pacemaker/pengine/pe-input-625.bz2</div>
<div> 2016-08-11T09:19:53.155927-07:00 ars-2 crmd[2758]: notice: Transition 222 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-625.bz2): Complete</div>
<div> 2016-08-11T09:19:53.156085-07:00 ars-2 crmd[2758]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]</div>
<div> </div>
<div> </div>
<div>Can anyone provide thoughs on how to debug this?</div>
<div>Should I give up with the SLES provided RA and use PAF instead?</div>
<div> </div>
<div>Thanks,</div>
<div>Darren</div>
<div> </div>
<div> </div>
<div> </div>
</span></font>
</body>
</html>