[ClusterLabs] Postgres streaming VIP-REP not coming up on slave

Mon Mar 16 07:48:19 UTC 2015

Hi
When I start out cl1_lb1 (Cluster 1 load balancer 1) is the master as below
cl1_lb1:~ # crm_mon -1 -Af
Last updated: Mon Mar 16 09:44:44 2015
Last change: Mon Mar 16 08:06:26 2015 by root via crm_attribute on cl1_lb1
Stack: classic openais (with plugin)
Current DC: cl2_lb1 - partition with quorum
Version: 1.1.9-2db99f1
2 Nodes configured, 2 expected votes
6 Resources configured.

Online: [ cl1_lb1 cl2_lb1 ]

 Resource Group: master-group
     vip-master    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
     vip-rep    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
     CBC_instance    (ocf::heartbeat:cbc):    Started cl1_lb1
     failover_MailTo    (ocf::heartbeat:MailTo):    Started cl1_lb1
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ cl1_lb1 ]
     Slaves: [ cl2_lb1 ]

Node Attributes:
* Node cl1_lb1:
    + master-pgsql                        : 1000
    + pgsql-data-status                   : LATEST
    + pgsql-master-baseline               : 00000008B90061F0
    + pgsql-status                        : PRI
* Node cl2_lb1:
    + master-pgsql                        : 100
    + pgsql-data-status                   : STREAMING|SYNC
    + pgsql-status                        : HS:sync

Migration summary:
* Node cl2_lb1:
* Node cl1_lb1:
cl1_lb1:~ #

If I then do a power off on cl1_lb1 (master), Postgres moves to cl2_lb1
(Cluster 2 load balancer 1), but the VIP-MASTER and VIP-REP is not pingable
from the NEW master (cl2_lb1), it stays line this below
cl2_lb1:~ # crm_mon -1 -Af
Last updated: Mon Mar 16 07:32:07 2015
Last change: Mon Mar 16 07:28:53 2015 by root via crm_attribute on cl1_lb1
Stack: classic openais (with plugin)
Current DC: cl2_lb1 - partition WITHOUT quorum
Version: 1.1.9-2db99f1
2 Nodes configured, 2 expected votes
6 Resources configured.

Online: [ cl2_lb1 ]
OFFLINE: [ cl1_lb1 ]

 Master/Slave Set: msPostgresql [pgsql]
     Slaves: [ cl2_lb1 ]
     Stopped: [ pgsql:1 ]

Node Attributes:
* Node cl2_lb1:
    + master-pgsql                        : -INFINITY
    + pgsql-data-status                   : DISCONNECT
    + pgsql-status                        : HS:alone

Migration summary:
* Node cl2_lb1:
cl2_lb1:~ #

Notice there is no VIPs, looks like the VIPs depends on some other resource
to start 1st?
Thanks for the reply!

On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA Kazutomo <
nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:

> Hi,
>
> > fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not come
>
> cl2_lb1 acts as a slave? It is not a master?
> VIPs comes up with master msPostgresql resource.
>
> If promote action was failed in the cl2_lb1, then
> please send a ha-log and PostgreSQL's log.
>
> Best regards,
> Kazutomo NAKAHIRA
>
>
> On 2015/03/16 16:09, Wynand Jansen van Vuuren wrote:
>
>> Hi all,
>>
>> I have 2 nodes, with 2 interfaces each, ETH0 is used for an application,
>> CBC, that's writing to the Postgres DB on the VIP-MASTER 172.28.200.159,
>> ETH1 is used for the Corosync configuration and VIP-REP, everything works,
>> but if the master currently on cl1_lb1 has a catastrophic failure, like
>> power down, the VIPs does not start on the slave, the Postgres parts works
>> fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not come
>> up. If I test it manually, IE kill the application 3 times on the master,
>> the switchover is smooth, same if I kill Postgres on master, but when
>> there
>> is a power failure on the Master, the VIPs stay down. If I then delete the
>> attributes pgsql-data-status="LATEST" and attributes
>> pgsql-data-status="STREAMING|SYNC" on the slave after power off on the
>> master and restart everything, then the VIPs come up on the slave, any
>> ideas please?
>> I'm using this setup
>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>>
>> With this configuration below
>> node cl1_lb1 \
>>          attributes pgsql-data-status="LATEST"
>> node cl2_lb1 \
>>          attributes pgsql-data-status="STREAMING|SYNC"
>> primitive CBC_instance ocf:heartbeat:cbc \
>>          op monitor interval="60s" timeout="60s" on-fail="restart" \
>>          op start interval="0s" timeout="60s" on-fail="restart" \
>>          meta target-role="Started" migration-threshold="3"
>> failure-timeout="60s"
>> primitive failover_MailTo ocf:heartbeat:MailTo \
>>          params email="wynandj at rorotika.com" subject="Cluster Status
>> change
>> - " \
>>          op monitor interval="10" timeout="10" dept="0"
>> primitive pgsql ocf:heartbeat:pgsql \
>>          params pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
>> psql="/opt/app/PostgreSQL/9.3/bin/psql"
>> config="/opt/app/pgdata/9.3/postgresql.conf" pgdba="postgres"
>> pgdata="/opt/app/pgdata/9.3/" start_opt="-p 5432" rep_mode="sync"
>> node_list="cl1_lb1 cl2_lb1" restore_command="cp /pgtablespace/archive/%f
>> %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
>> keepalives_count=5" master_ip="172.16.0.5" restart_on_promote="false"
>> logfile="/var/log/OCF.log" \
>>          op start interval="0s" timeout="60s" on-fail="restart" \
>>          op monitor interval="4s" timeout="60s" on-fail="restart" \
>>          op monitor interval="3s" role="Master" timeout="60s"
>> on-fail="restart" \
>>          op promote interval="0s" timeout="60s" on-fail="restart" \
>>          op demote interval="0s" timeout="60s" on-fail="stop" \
>>          op stop interval="0s" timeout="60s" on-fail="block" \
>>          op notify interval="0s" timeout="60s"
>> primitive vip-master ocf:heartbeat:IPaddr2 \
>>          params ip="172.28.200.159" nic="eth0" iflabel="CBC_VIP"
>> cidr_netmask="24" \
>>          op start interval="0s" timeout="60s" on-fail="restart" \
>>          op monitor interval="10s" timeout="60s" on-fail="restart" \
>>          op stop interval="0s" timeout="60s" on-fail="block" \
>>          meta target-role="Started"
>> primitive vip-rep ocf:heartbeat:IPaddr2 \
>>          params ip="172.16.0.5" nic="eth1" iflabel="REP_VIP"
>> cidr_netmask="24" \
>>          meta migration-threshold="0" target-role="Started" \
>>          op start interval="0s" timeout="60s" on-fail="stop" \
>>          op monitor interval="10s" timeout="60s" on-fail="restart" \
>>          op stop interval="0s" timeout="60s" on-fail="restart"
>> group master-group vip-master vip-rep CBC_instance failover_MailTo
>> ms msPostgresql pgsql \
>>          meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> colocation rsc_colocation-1 inf: master-group msPostgresql:Master
>> order rsc_order-1 0: msPostgresql:promote master-group:start
>> symmetrical=false
>> order rsc_order-2 0: msPostgresql:demote master-group:stop
>> symmetrical=false
>> property $id="cib-bootstrap-options" \
>>          dc-version="1.1.9-2db99f1" \
>>          cluster-infrastructure="classic openais (with plugin)" \
>>          expected-quorum-votes="2" \
>>          no-quorum-policy="ignore" \
>>          stonith-enabled="false" \
>>          cluster-recheck-interval="1min" \
>>          crmd-transition-delay="0s" \
>>          last-lrm-refresh="1426485983"
>>          rsc_defaults $id="rsc-options" \
>>          resource-stickiness="INFINITY" \
>>          migration-threshold="1"
>> #vim:set syntax=pcmk
>>
>> Any ideas please, I'm lost......
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150316/62554b78/attachment.htm>