[Pacemaker] pgsql RA - slave is in HS:ASYNC status and won; t promote

Fri Jan 10 03:48:04 EST 2014

Hi,

i am trying to run postgresql cluster with streaming replication using
pgsql RA and pacemaker ..
i succeded once, master was as PRI, slave HS:sync, failover worked as it
should (slave become master) ..
but after some tests something went wrong and i don't know what and why and
how to get it back working ... now when i start crm, master is PRI, but
slave gets into HS:ASYNC state .. and when master fails, and slave gets
into HS:alone state

can somebody please give me hint what should i do or what should i look for?

Thanks a lot for any help
Tomas

my configuration:

node jboss-test \
        attributes pgsql-data-status="LATEST"
node jboss-test2 \
        attributes pgsql-data-status="STREAMING|ASYNC"
primitive pgsql ocf:heartbeat:pgsql \
        params pgctl="/opt/postgres/9.3/bin/pg_ctl"
psql="/opt/postgres/9.3/bin/psql" pgdata="/opt/postgres/9.3/data/"
rep_mode="sync" node_list="jboss-test jboss-test2" restore_command="cp
/opt/postgres/9.3/data/pg_archive/%f %p"
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
keepalives_count=5" master_ip="172.16.111.120" stop_escalate="0" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block" \
        op monitor interval="11s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" role="Master" timeout="60s"
on-fail="restart" \
        op promote interval="0s" timeout="60s" on-fail="restart" \
        op demote interval="0s" timeout="60s" on-fail="block" \
        op notify interval="0s" timeout="60s"
primitive pingCheck ocf:pacemaker:ping \
        params name="default_ping_set" host_list="172.16.0.1"
multiplier="100" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="2s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="ignore"
primitive vip-master ocf:heartbeat:IPaddr2 \
        params ip="172.16.111.110" nic="eth0" cidr_netmask="24" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block"
primitive vip-rep ocf:heartbeat:IPaddr2 \
        params ip="172.16.111.120" nic="eth0" cidr_netmask="24" \
        meta migration-threshold="0" \
        op start interval="0s" timeout="60s" on-fail="stop" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block"
primitive vip-slave ocf:heartbeat:IPaddr2 \
        params ip="172.16.111.111" nic="eth0" cidr_netmask="24" \
        meta resource-stickiness="1" \
        op start interval="0s" timeout="60s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block"
group master-group vip-master vip-rep \
        meta ordered="false"
ms msPostgresql pgsql \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
clone clnPingCheck pingCheck
location rsc_location-1 vip-slave \
        rule $id="rsc_location-1-rule" 200: pgsql-status eq HS:sync \
        rule $id="rsc_location-1-rule-0" 190: pgsql-status eq HS:async \
        rule $id="rsc_location-1-rule-1" 100: pgsql-status eq PRI \
        rule $id="rsc_location-1-rule-2" -inf: not_defined pgsql-status \
        rule $id="rsc_location-1-rule-3" -inf: pgsql-status ne HS:sync and
pgsql-status ne PRI and pgsql-status ne HS:async
location rsc_location-2 msPostgresql \
        rule $id="rsc_location-3-rule" -inf: not_defined default_ping_set
or default_ping_set lt 100
colocation rsc_colocation-1 inf: msPostgresql clnPingCheck
colocation rsc_colocation-2 inf: master-group msPostgresql:Master
order rsc_order-1 0: clnPingCheck msPostgresql
order rsc_order-2 0: msPostgresql:promote master-group:start
symmetrical=false
order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false
property $id="cib-bootstrap-options" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        crmd-transition-delay="0s" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        last-lrm-refresh="1389301940"
rsc_defaults $id="rsc-options" \
        resource-stickiness="INFINITY" \
        migration-threshold="1"

crm_mon -Afr:
============
Last updated: Fri Jan 10 09:46:29 2014
Last change: Fri Jan 10 09:46:29 2014 by root via crm_attribute on
jboss-test
Stack: openais
Current DC: jboss-test - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
7 Resources configured.
============

Online: [ jboss-test jboss-test2 ]

Full list of resources:

 Clone Set: clnPingCheck [pingCheck]
     Started: [ jboss-test jboss-test2 ]
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ jboss-test ]
     Slaves: [ jboss-test2 ]
vip-slave       (ocf::heartbeat:IPaddr2):       Started jboss-test2
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2):       Started jboss-test
     vip-rep    (ocf::heartbeat:IPaddr2):       Started jboss-test

Node Attributes:
* Node jboss-test:
    + default_ping_set                  : 100
    + master-pgsql:0                    : 1000
    + pgsql-data-status                 : LATEST
    + pgsql-master-baseline             : 0000000039004DF0
    + pgsql-status                      : PRI
* Node jboss-test2:
    + default_ping_set                  : 100
    + master-pgsql:1                    : -INFINITY
    + pgsql-data-status                 : STREAMING|ASYNC
    + pgsql-status                      : HS:async
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140110/4ae5dbe9/attachment-0002.html>