[ClusterLabs] Postgres never promoted

Sun Feb 22 19:43:02 UTC 2015

> On 21 Feb 2015, at 6:59 am, Alexandre <alxgomz at gmail.com> wrote:
> 
> Hi list,
> 
> I am facing a very strange issue.
> I have setup a postgresql cluster (with streaming repl).
> The replication works ok when started manually but the RA seems to never promote any host where the resource is started.
> 
> I am running pacemaker 1.12 on centos 6.6 (and I added crmsh from an opensuse repo, as I am used to it)

It might be worth grabbing the very latest pgsql agent from https://github.com/ClusterLabs/resource-agents
You've also compared your setup to the one described at http://clusterlabs.org/wiki/PostgresHowto ?

> 
> my config is bellow:
> node pp-obm-sgbd.upond.fr
> node pp-obm-sgbd2.upond.fr \
>     attributes pri_pgsql-data-status=DISCONNECT
> primitive pri_obm-locator lsb:obm-locator \
>     params \
>     op start interval=0s timeout=60s \
>     op stop interval=0s timeout=60s \
>     op monitor interval=10s timeout=20s
> primitive pri_pgsql pgsql \
>     params pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data/" node_list="pp-obm-sgbd.upond.fr pp-obm-sgbd2.upond.fr" repuser=replication rep_mode=sync restart_on_promote=true restore_command="cp /var/lib/pgsql/replication/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=193.50.151.200 \
>     op start interval=0 on-fail=restart timeout=120s \
>     op monitor interval=20s on-fail=restart timeout=60s \
>     op monitor interval=15s on-fail=restart role=Master timeout=60s \
>     op promote interval=0 on-fail=restart timeout=120s \
>     op demote interval=0 on-fail=stop timeout=120s \
>     op notify interval=0s timeout=60s \
>     op stop interval=0 on-fail=block timeout=120s
> primitive pri_vip IPaddr2 \
>     params ip=193.50.151.200 nic=eth1 cidr_netmask=32 \
>     op start interval=0s timeout=60s \
>     op monitor interval=10s timeout=60s \
>     op stop interval=0s timeout=60s
> ms ms_pgsql pri_pgsql \
>     meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> colocation clc_vip-ms_pgsql inf: pri_vip:Started ms_pgsql:Master
> order ord_dm_pgsql-vip 0: ms_pgsql:demote pri_vip:stop
> order ord_pm_pgsql-vip 0: ms_pgsql:promote pri_vip:start symmetrical=false
> property cib-bootstrap-options: \
>     dc-version=1.1.11-97629de \
>     cluster-infrastructure=cman \
>     last-lrm-refresh=1424459378 \
>     no-quorum-policy=ignore \
>     stonith-enabled=false \
>     maintenance-mode=false
> rsc_defaults rsc_defaults-options: \
>     resource-stickiness=1000 \
>     migration-threshold=5
> 
> crm_mon shows both hosts as slaves and none is never promoted ever:
> 
> Master/Slave Set: ms_pgsql [pri_pgsql]
>      Slaves: [ pp-obm-sgbd.upond.fr pp-obm-sgbd2.upond.fr ]
> Node Attributes:
> * Node pp-obm-sgbd.upond.fr:
>     + master-pri_pgsql                  : 1000
>     + pri_pgsql-status                  : HS:alone  
>     + pri_pgsql-xlog-loc                : 000000002D000078
> * Node pp-obm-sgbd2.upond.fr:
>     + master-pri_pgsql                  : -INFINITY 
>     + pri_pgsql-data-status             : DISCONNECT
>     + pri_pgsql-status                  : HS:alone  
>     + pri_pgsql-xlog-loc                : 000000002D000000
> 
> on the host I am expecting promotion I see when doing cleanups:
> Feb 20 20:15:07 pp-obm-sgbd pgsql(pri_pgsql)[30994]: INFO: Master does not exist.
> Feb 20 20:15:07 pp-obm-sgbd pgsql(pri_pgsql)[30994]: INFO: My data status=.
> 
> And on the other node I see the following logs that sounds interrseting:
> Feb 20 20:16:10 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse: [Action   18]: Pending pseudo op ms_pgsql_promoted_0              on N/A (priority: 1000000, waiting:  11)
> Feb 20 20:16:10 pp-obm-sgbd2 crmd[19626]:   notice: print_synapse: [Action   17]: Pending pseudo op ms_pgsql_promote_0               on N/A (priority: 0, waiting:  21)
> 
> the N/A part seems to tell me the cluster don't know where to promote the resource but I can't understand why.
> 
> bellow are my constraint rules:
> 
> pcs constraint show 
> Location Constraints:
> Ordering Constraints:
>   demote ms_pgsql then stop pri_vip (score:0)
>   promote ms_pgsql then start pri_vip (score:0) (non-symmetrical)
> Colocation Constraints:
>   pri_vip with ms_pgsql (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)
> 
> I am now out of ideas so any help is very much appreciated.
> 
> Regards.
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org