[Pacemaker] pgsql RA - slave is in HS:ASYNC status and won; t promote

Wed Jan 22 00:49:21 EST 2014

Hi, Tomáš.

 >  my slave node is still async, event the select you mentioned shows async ..

Is "synchronous_standby_names" set correctly ?
synchronous_standby_names is PostgreSQL's parameter.
That is a list of standby names that can support synchronous replication(*).

(* please see below for details)
  http://www.postgresql.org/docs/9.3/static/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-MASTER

pgsql RA has control this parameter by the following steps.

1) pgsql RA make a file:"/var/lib/pgsql/tmp/rep_mode.conf"
    synchronous_standby_names is written in this file.

2) pgsql RA append the following line to the postgresql.conf

     include '/var/lib/pgsql/tmp/rep_mode.conf'

    The synchronous_standby_names generated by the RA will affect the PostgreSQL(postgresql.conf).

3) pgsql RA execute the following command.

     psql -D /path/to/pg_data reload

    PostgreSQL will reload the postgresql.conf.

So, please check the value of "synchronous_standby_names" by executing the following two commands at Master.
  # cat /var/lib/pgsql/tmp/rep_mode.conf
  --> is value correct?

  # psql -c "show synchronous_standby_names"
  --> is value correct?

If those are correct, the Master(PostgreSQL) will try to synchronize the slave.

Regards,
Kazuhiko HIGASHI

(2014/01/18 4:45), Tomáš Vajrauch wrote:
> Hi!
> thanks for help .. anyway - my slave node is still async, event the select you mentioned shows async .. at least i found out, that if i set rep_mode to "async", the slave node gets promoted when master fails ...
> so right now it is working, but i would like still know how to make streaming replication synchronous .. i did everything as in mentioned wiki page, but it is still async
> any idea?
> Thanks
> Tomas
>
>
> 2014/1/14 東一彦 <higashi.kazuhiko at lab.ntt.co.jp <mailto:higashi.kazuhiko at lab.ntt.co.jp>>
>
>     Hi,
>
>
>      > but after some tests something went wrong and i don't know what and why and how to get it back working ... now when i start crm, master is PRI, but slave gets into HS:ASYNC state .. and when master fails, and slave gets into HS:alone state
>     It is PostgreSQL to select the node whether "sync" or "async".
>     pgsql RA displays a result of the following SQL.
>
>        select application_name,upper(state),__upper(sync_state) from pg_stat_replication;
>
>     So, at first, please watch PostgreSQL's log.
>
>
>
>     Possibly the data may become inconsistent.
>     You can resolve the inconsistency in the following operation.
>
>     http://clusterlabs.org/wiki/__PgSQL_Replicated_Cluster#__after_fail-over <http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#after_fail-over>
>
>
>     Regards,
>     Kazuhiko HIGASHI
>
>
>     (2014/01/10 17:48), Tomáš Vajrauch wrote:
>
>         Hi,
>
>         i am trying to run postgresql cluster with streaming replication using pgsql RA and pacemaker ..
>         i succeded once, master was as PRI, slave HS:sync, failover worked as it should (slave become master) ..
>         but after some tests something went wrong and i don't know what and why and how to get it back working ... now when i start crm, master is PRI, but slave gets into HS:ASYNC state .. and when master fails, and slave gets into HS:alone state
>
>         can somebody please give me hint what should i do or what should i look for?
>
>         Thanks a lot for any help
>         Tomas
>
>         my configuration:
>
>         node jboss-test \
>                   attributes pgsql-data-status="LATEST"
>         node jboss-test2 \
>                   attributes pgsql-data-status="STREAMING|__ASYNC"
>         primitive pgsql ocf:heartbeat:pgsql \
>                   params
>  pgctl="/opt/postgres/9.3/bin/pg_ctl"
>  psql="/opt/postgres/9.3/bin/psql"
>  pgdata="/opt/postgres/9.3/data/"
>  rep_mode="sync"
>  node_list="jboss-test jboss-test2"
>  restore_command="cp /opt/postgres/9.3/data/pg_archive/%f %p"
>  primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5"
>  master_ip="172.16.111.120"
>  stop_escalate="0" \
>                   op start interval="0s" timeout="60s" on-fail="restart" \
>                   op stop interval="0s" timeout="60s" on-fail="block" \
>                   op monitor interval="11s" timeout="60s" on-fail="restart" \
>                   op monitor interval="10s" role="Master" timeout="60s" on-fail="restart" \
>                   op promote interval="0s" timeout="60s" on-fail="restart" \
>                   op demote interval="0s" timeout="60s" on-fail="block" \
>                   op notify interval="0s" timeout="60s"
>         primitive pingCheck ocf:pacemaker:ping \
>                   params name="default_ping_set" host_list="172.16.0.1" multiplier="100" \
>                   op start interval="0s" timeout="60s" on-fail="restart" \
>                   op monitor interval="2s" timeout="60s" on-fail="restart" \
>                   op stop interval="0s" timeout="60s" on-fail="ignore"
>         primitive vip-master ocf:heartbeat:IPaddr2 \
>                   params ip="172.16.111.110" nic="eth0" cidr_netmask="24" \
>                   op start interval="0s" timeout="60s" on-fail="restart" \
>                   op monitor interval="10s" timeout="60s" on-fail="restart" \
>                   op stop interval="0s" timeout="60s" on-fail="block"
>         primitive vip-rep ocf:heartbeat:IPaddr2 \
>                   params ip="172.16.111.120" nic="eth0" cidr_netmask="24" \
>                   meta migration-threshold="0" \
>                   op start interval="0s" timeout="60s" on-fail="stop" \
>                   op monitor interval="10s" timeout="60s" on-fail="restart" \
>                   op stop interval="0s" timeout="60s" on-fail="block"
>         primitive vip-slave ocf:heartbeat:IPaddr2 \
>                   params ip="172.16.111.111" nic="eth0" cidr_netmask="24" \
>                   meta resource-stickiness="1" \
>                   op start interval="0s" timeout="60s" on-fail="restart" \
>                   op monitor interval="10s" timeout="60s" on-fail="restart" \
>                   op stop interval="0s" timeout="60s" on-fail="block"
>         group master-group vip-master vip-rep \
>                   meta ordered="false"
>         ms msPostgresql pgsql \
>                   meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
>         clone clnPingCheck pingCheck
>         location rsc_location-1 vip-slave \
>                   rule $id="rsc_location-1-rule" 200: pgsql-status eq HS:sync \
>                   rule $id="rsc_location-1-rule-0" 190: pgsql-status eq HS:async \
>                   rule $id="rsc_location-1-rule-1" 100: pgsql-status eq PRI \
>                   rule $id="rsc_location-1-rule-2" -inf: not_defined pgsql-status \
>                   rule $id="rsc_location-1-rule-3" -inf: pgsql-status ne HS:sync and pgsql-status ne PRI and pgsql-status ne HS:async
>         location rsc_location-2 msPostgresql \
>                   rule $id="rsc_location-3-rule" -inf: not_defined default_ping_set or default_ping_set lt 100
>         colocation rsc_colocation-1 inf: msPostgresql clnPingCheck
>         colocation rsc_colocation-2 inf: master-group msPostgresql:Master
>         order rsc_order-1 0: clnPingCheck msPostgresql
>         order rsc_order-2 0: msPostgresql:promote master-group:start symmetrical=false
>         order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false
>         property $id="cib-bootstrap-options" \
>                   no-quorum-policy="ignore" \
>                   stonith-enabled="false" \
>                   crmd-transition-delay="0s" \
>                   dc-version="1.1.6-__9971ebba4494012a93c03b40a2c58e__c0eb60f50c" \
>                   cluster-infrastructure="__openais" \
>                   expected-quorum-votes="2" \
>                   last-lrm-refresh="1389301940"
>         rsc_defaults $id="rsc-options" \
>                   resource-stickiness="INFINITY" \
>                   migration-threshold="1"
>
>         crm_mon -Afr:
>         ============
>         Last updated: Fri Jan 10 09:46:29 2014
>         Last change: Fri Jan 10 09:46:29 2014 by root via crm_attribute on jboss-test
>         Stack: openais
>         Current DC: jboss-test - partition with quorum
>         Version: 1.1.6-__9971ebba4494012a93c03b40a2c58e__c0eb60f50c
>         2 Nodes configured, 2 expected votes
>         7 Resources configured.
>         ============
>
>         Online: [ jboss-test jboss-test2 ]
>
>         Full list of resources:
>
>            Clone Set: clnPingCheck [pingCheck]
>                Started: [ jboss-test jboss-test2 ]
>            Master/Slave Set: msPostgresql [pgsql]
>                Masters: [ jboss-test ]
>                Slaves: [ jboss-test2 ]
>         vip-slave       (ocf::heartbeat:IPaddr2):       Started jboss-test2
>            Resource Group: master-group
>                vip-master (ocf::heartbeat:IPaddr2):       Started jboss-test
>                vip-rep    (ocf::heartbeat:IPaddr2):       Started jboss-test
>
>         Node Attributes:
>         * Node jboss-test:
>               + default_ping_set                  : 100
>               + master-pgsql:0                    : 1000
>               + pgsql-data-status                 : LATEST
>               + pgsql-master-baseline             : 0000000039004DF0
>               + pgsql-status                      : PRI
>         * Node jboss-test2:
>               + default_ping_set                  : 100
>               + master-pgsql:1                    : -INFINITY
>               + pgsql-data-status                 : STREAMING|ASYNC
>               + pgsql-status                      : HS:async
>
>
>
>         _________________________________________________
>         Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <mailto:Pacemaker at oss.clusterlabs.org>
>         http://oss.clusterlabs.org/__mailman/listinfo/pacemaker <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
>         Project Home: http://www.clusterlabs.org
>         Getting started: http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>         Bugs: http://bugs.clusterlabs.org
>
>
>
>     --
>     _________________________________________________
>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org <mailto:Pacemaker at oss.clusterlabs.org>
>     http://oss.clusterlabs.org/__mailman/listinfo/pacemaker <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
>     Project Home: http://www.clusterlabs.org
>     Getting started: http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>     Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>