[ClusterLabs] Postgres streaming VIP-REP not coming up on slave

Mon Mar 16 08:15:38 UTC 2015

Hi,

 > Notice there is no VIPs, looks like the VIPs depends on some other 
resource
 > to start 1st?

The following constraints means that "master-group" can not start
without master of msPostgresql resource.

colocation rsc_colocation-1 inf: master-group msPostgresql:Master

After you power off cl1_lb1, msPostgresql on the cl2_lb1 is not promoted 
and master is not exist in your cluster.

It means that "master-group" can not run anyware.

Best regards,
Kazutomo NAKAHIRA

On 2015/03/16 16:48, Wynand Jansen van Vuuren wrote:
> Hi
> When I start out cl1_lb1 (Cluster 1 load balancer 1) is the master as below
> cl1_lb1:~ # crm_mon -1 -Af
> Last updated: Mon Mar 16 09:44:44 2015
> Last change: Mon Mar 16 08:06:26 2015 by root via crm_attribute on cl1_lb1
> Stack: classic openais (with plugin)
> Current DC: cl2_lb1 - partition with quorum
> Version: 1.1.9-2db99f1
> 2 Nodes configured, 2 expected votes
> 6 Resources configured.
>
>
> Online: [ cl1_lb1 cl2_lb1 ]
>
>   Resource Group: master-group
>       vip-master    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>       vip-rep    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>       CBC_instance    (ocf::heartbeat:cbc):    Started cl1_lb1
>       failover_MailTo    (ocf::heartbeat:MailTo):    Started cl1_lb1
>   Master/Slave Set: msPostgresql [pgsql]
>       Masters: [ cl1_lb1 ]
>       Slaves: [ cl2_lb1 ]
>
> Node Attributes:
> * Node cl1_lb1:
>      + master-pgsql                        : 1000
>      + pgsql-data-status                   : LATEST
>      + pgsql-master-baseline               : 00000008B90061F0
>      + pgsql-status                        : PRI
> * Node cl2_lb1:
>      + master-pgsql                        : 100
>      + pgsql-data-status                   : STREAMING|SYNC
>      + pgsql-status                        : HS:sync
>
> Migration summary:
> * Node cl2_lb1:
> * Node cl1_lb1:
> cl1_lb1:~ #
>
> If I then do a power off on cl1_lb1 (master), Postgres moves to cl2_lb1
> (Cluster 2 load balancer 1), but the VIP-MASTER and VIP-REP is not pingable
> from the NEW master (cl2_lb1), it stays line this below
> cl2_lb1:~ # crm_mon -1 -Af
> Last updated: Mon Mar 16 07:32:07 2015
> Last change: Mon Mar 16 07:28:53 2015 by root via crm_attribute on cl1_lb1
> Stack: classic openais (with plugin)
> Current DC: cl2_lb1 - partition WITHOUT quorum
> Version: 1.1.9-2db99f1
> 2 Nodes configured, 2 expected votes
> 6 Resources configured.
>
>
> Online: [ cl2_lb1 ]
> OFFLINE: [ cl1_lb1 ]
>
>   Master/Slave Set: msPostgresql [pgsql]
>       Slaves: [ cl2_lb1 ]
>       Stopped: [ pgsql:1 ]
>
> Node Attributes:
> * Node cl2_lb1:
>      + master-pgsql                        : -INFINITY
>      + pgsql-data-status                   : DISCONNECT
>      + pgsql-status                        : HS:alone
>
> Migration summary:
> * Node cl2_lb1:
> cl2_lb1:~ #
>
> Notice there is no VIPs, looks like the VIPs depends on some other resource
> to start 1st?
> Thanks for the reply!
>
>
> On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA Kazutomo <
> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>
>> Hi,
>>
>>> fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not come
>>
>> cl2_lb1 acts as a slave? It is not a master?
>> VIPs comes up with master msPostgresql resource.
>>
>> If promote action was failed in the cl2_lb1, then
>> please send a ha-log and PostgreSQL's log.
>>
>> Best regards,
>> Kazutomo NAKAHIRA
>>
>>
>> On 2015/03/16 16:09, Wynand Jansen van Vuuren wrote:
>>
>>> Hi all,
>>>
>>> I have 2 nodes, with 2 interfaces each, ETH0 is used for an application,
>>> CBC, that's writing to the Postgres DB on the VIP-MASTER 172.28.200.159,
>>> ETH1 is used for the Corosync configuration and VIP-REP, everything works,
>>> but if the master currently on cl1_lb1 has a catastrophic failure, like
>>> power down, the VIPs does not start on the slave, the Postgres parts works
>>> fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not come
>>> up. If I test it manually, IE kill the application 3 times on the master,
>>> the switchover is smooth, same if I kill Postgres on master, but when
>>> there
>>> is a power failure on the Master, the VIPs stay down. If I then delete the
>>> attributes pgsql-data-status="LATEST" and attributes
>>> pgsql-data-status="STREAMING|SYNC" on the slave after power off on the
>>> master and restart everything, then the VIPs come up on the slave, any
>>> ideas please?
>>> I'm using this setup
>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>>>
>>> With this configuration below
>>> node cl1_lb1 \
>>>           attributes pgsql-data-status="LATEST"
>>> node cl2_lb1 \
>>>           attributes pgsql-data-status="STREAMING|SYNC"
>>> primitive CBC_instance ocf:heartbeat:cbc \
>>>           op monitor interval="60s" timeout="60s" on-fail="restart" \
>>>           op start interval="0s" timeout="60s" on-fail="restart" \
>>>           meta target-role="Started" migration-threshold="3"
>>> failure-timeout="60s"
>>> primitive failover_MailTo ocf:heartbeat:MailTo \
>>>           params email="wynandj at rorotika.com" subject="Cluster Status
>>> change
>>> - " \
>>>           op monitor interval="10" timeout="10" dept="0"
>>> primitive pgsql ocf:heartbeat:pgsql \
>>>           params pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
>>> psql="/opt/app/PostgreSQL/9.3/bin/psql"
>>> config="/opt/app/pgdata/9.3/postgresql.conf" pgdba="postgres"
>>> pgdata="/opt/app/pgdata/9.3/" start_opt="-p 5432" rep_mode="sync"
>>> node_list="cl1_lb1 cl2_lb1" restore_command="cp /pgtablespace/archive/%f
>>> %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
>>> keepalives_count=5" master_ip="172.16.0.5" restart_on_promote="false"
>>> logfile="/var/log/OCF.log" \
>>>           op start interval="0s" timeout="60s" on-fail="restart" \
>>>           op monitor interval="4s" timeout="60s" on-fail="restart" \
>>>           op monitor interval="3s" role="Master" timeout="60s"
>>> on-fail="restart" \
>>>           op promote interval="0s" timeout="60s" on-fail="restart" \
>>>           op demote interval="0s" timeout="60s" on-fail="stop" \
>>>           op stop interval="0s" timeout="60s" on-fail="block" \
>>>           op notify interval="0s" timeout="60s"
>>> primitive vip-master ocf:heartbeat:IPaddr2 \
>>>           params ip="172.28.200.159" nic="eth0" iflabel="CBC_VIP"
>>> cidr_netmask="24" \
>>>           op start interval="0s" timeout="60s" on-fail="restart" \
>>>           op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>           op stop interval="0s" timeout="60s" on-fail="block" \
>>>           meta target-role="Started"
>>> primitive vip-rep ocf:heartbeat:IPaddr2 \
>>>           params ip="172.16.0.5" nic="eth1" iflabel="REP_VIP"
>>> cidr_netmask="24" \
>>>           meta migration-threshold="0" target-role="Started" \
>>>           op start interval="0s" timeout="60s" on-fail="stop" \
>>>           op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>           op stop interval="0s" timeout="60s" on-fail="restart"
>>> group master-group vip-master vip-rep CBC_instance failover_MailTo
>>> ms msPostgresql pgsql \
>>>           meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true"
>>> colocation rsc_colocation-1 inf: master-group msPostgresql:Master
>>> order rsc_order-1 0: msPostgresql:promote master-group:start
>>> symmetrical=false
>>> order rsc_order-2 0: msPostgresql:demote master-group:stop
>>> symmetrical=false
>>> property $id="cib-bootstrap-options" \
>>>           dc-version="1.1.9-2db99f1" \
>>>           cluster-infrastructure="classic openais (with plugin)" \
>>>           expected-quorum-votes="2" \
>>>           no-quorum-policy="ignore" \
>>>           stonith-enabled="false" \
>>>           cluster-recheck-interval="1min" \
>>>           crmd-transition-delay="0s" \
>>>           last-lrm-refresh="1426485983"
>>>           rsc_defaults $id="rsc-options" \
>>>           resource-stickiness="INFINITY" \
>>>           migration-threshold="1"
>>> #vim:set syntax=pcmk
>>>
>>> Any ideas please, I'm lost......
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
NTT オープンソースソフトウェアセンタ
中平 和友
TEL: 03-5860-5135 FAX: 03-5463-6490
Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp