[ClusterLabs] Postgres streaming VIP-REP not coming up on slave

Mon Mar 16 09:01:27 UTC 2015

Thanks for the advice, I have a demo on this now, so I don't want to test
this now, I will do so tomorrow and forwards the logs, many thanks!!

On Mon, Mar 16, 2015 at 10:54 AM, NAKAHIRA Kazutomo <
nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:

> Hi,
>
> > do you suggest that I take it out? or should I look at the problem where
> > cl2_lb1 is not being promoted?
>
> You should look at the problem where cl2_lb1 is not being promoted.
> And I look it if you send me a ha-log and PostgreSQL's log.
>
> Best regards,
> Kazutomo NAKAHIRA
>
>
> On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:
>
>> Hi Nakahira,
>> Thanks so much for the info, this setting was as the wiki page suggested,
>> do you suggest that I take it out? or should I look at the problem where
>> cl2_lb1 is not being promoted?
>> Regards
>>
>> On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <
>> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>
>>  Hi,
>>>
>>>  Notice there is no VIPs, looks like the VIPs depends on some other
>>>>
>>> resource
>>>
>>>> to start 1st?
>>>>
>>>
>>> The following constraints means that "master-group" can not start
>>> without master of msPostgresql resource.
>>>
>>> colocation rsc_colocation-1 inf: master-group msPostgresql:Master
>>>
>>> After you power off cl1_lb1, msPostgresql on the cl2_lb1 is not promoted
>>> and master is not exist in your cluster.
>>>
>>> It means that "master-group" can not run anyware.
>>>
>>> Best regards,
>>> Kazutomo NAKAHIRA
>>>
>>>
>>> On 2015/03/16 16:48, Wynand Jansen van Vuuren wrote:
>>>
>>>  Hi
>>>> When I start out cl1_lb1 (Cluster 1 load balancer 1) is the master as
>>>> below
>>>> cl1_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Mon Mar 16 09:44:44 2015
>>>> Last change: Mon Mar 16 08:06:26 2015 by root via crm_attribute on
>>>> cl1_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl2_lb1 - partition with quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl1_lb1 cl2_lb1 ]
>>>>
>>>>    Resource Group: master-group
>>>>        vip-master    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>>        vip-rep    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>>        CBC_instance    (ocf::heartbeat:cbc):    Started cl1_lb1
>>>>        failover_MailTo    (ocf::heartbeat:MailTo):    Started cl1_lb1
>>>>    Master/Slave Set: msPostgresql [pgsql]
>>>>        Masters: [ cl1_lb1 ]
>>>>        Slaves: [ cl2_lb1 ]
>>>>
>>>> Node Attributes:
>>>> * Node cl1_lb1:
>>>>       + master-pgsql                        : 1000
>>>>       + pgsql-data-status                   : LATEST
>>>>       + pgsql-master-baseline               : 00000008B90061F0
>>>>       + pgsql-status                        : PRI
>>>> * Node cl2_lb1:
>>>>       + master-pgsql                        : 100
>>>>       + pgsql-data-status                   : STREAMING|SYNC
>>>>       + pgsql-status                        : HS:sync
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>> * Node cl1_lb1:
>>>> cl1_lb1:~ #
>>>>
>>>> If I then do a power off on cl1_lb1 (master), Postgres moves to cl2_lb1
>>>> (Cluster 2 load balancer 1), but the VIP-MASTER and VIP-REP is not
>>>> pingable
>>>> from the NEW master (cl2_lb1), it stays line this below
>>>> cl2_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Mon Mar 16 07:32:07 2015
>>>> Last change: Mon Mar 16 07:28:53 2015 by root via crm_attribute on
>>>> cl1_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl2_lb1 - partition WITHOUT quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl2_lb1 ]
>>>> OFFLINE: [ cl1_lb1 ]
>>>>
>>>>    Master/Slave Set: msPostgresql [pgsql]
>>>>        Slaves: [ cl2_lb1 ]
>>>>        Stopped: [ pgsql:1 ]
>>>>
>>>> Node Attributes:
>>>> * Node cl2_lb1:
>>>>       + master-pgsql                        : -INFINITY
>>>>       + pgsql-data-status                   : DISCONNECT
>>>>       + pgsql-status                        : HS:alone
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>> cl2_lb1:~ #
>>>>
>>>> Notice there is no VIPs, looks like the VIPs depends on some other
>>>> resource
>>>> to start 1st?
>>>> Thanks for the reply!
>>>>
>>>>
>>>> On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA Kazutomo <
>>>> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>>
>>>>   Hi,
>>>>
>>>>>
>>>>>   fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not
>>>>> come
>>>>>
>>>>>>
>>>>>>
>>>>> cl2_lb1 acts as a slave? It is not a master?
>>>>> VIPs comes up with master msPostgresql resource.
>>>>>
>>>>> If promote action was failed in the cl2_lb1, then
>>>>> please send a ha-log and PostgreSQL's log.
>>>>>
>>>>> Best regards,
>>>>> Kazutomo NAKAHIRA
>>>>>
>>>>>
>>>>> On 2015/03/16 16:09, Wynand Jansen van Vuuren wrote:
>>>>>
>>>>>   Hi all,
>>>>>
>>>>>>
>>>>>> I have 2 nodes, with 2 interfaces each, ETH0 is used for an
>>>>>> application,
>>>>>> CBC, that's writing to the Postgres DB on the VIP-MASTER
>>>>>> 172.28.200.159,
>>>>>> ETH1 is used for the Corosync configuration and VIP-REP, everything
>>>>>> works,
>>>>>> but if the master currently on cl1_lb1 has a catastrophic failure,
>>>>>> like
>>>>>> power down, the VIPs does not start on the slave, the Postgres parts
>>>>>> works
>>>>>> fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not
>>>>>> come
>>>>>> up. If I test it manually, IE kill the application 3 times on the
>>>>>> master,
>>>>>> the switchover is smooth, same if I kill Postgres on master, but when
>>>>>> there
>>>>>> is a power failure on the Master, the VIPs stay down. If I then delete
>>>>>> the
>>>>>> attributes pgsql-data-status="LATEST" and attributes
>>>>>> pgsql-data-status="STREAMING|SYNC" on the slave after power off on
>>>>>> the
>>>>>> master and restart everything, then the VIPs come up on the slave, any
>>>>>> ideas please?
>>>>>> I'm using this setup
>>>>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>>>>>>
>>>>>> With this configuration below
>>>>>> node cl1_lb1 \
>>>>>>            attributes pgsql-data-status="LATEST"
>>>>>> node cl2_lb1 \
>>>>>>            attributes pgsql-data-status="STREAMING|SYNC"
>>>>>> primitive CBC_instance ocf:heartbeat:cbc \
>>>>>>            op monitor interval="60s" timeout="60s" on-fail="restart" \
>>>>>>            op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>            meta target-role="Started" migration-threshold="3"
>>>>>> failure-timeout="60s"
>>>>>> primitive failover_MailTo ocf:heartbeat:MailTo \
>>>>>>            params email="wynandj at rorotika.com" subject="Cluster
>>>>>> Status
>>>>>> change
>>>>>> - " \
>>>>>>            op monitor interval="10" timeout="10" dept="0"
>>>>>> primitive pgsql ocf:heartbeat:pgsql \
>>>>>>            params pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
>>>>>> psql="/opt/app/PostgreSQL/9.3/bin/psql"
>>>>>> config="/opt/app/pgdata/9.3/postgresql.conf" pgdba="postgres"
>>>>>> pgdata="/opt/app/pgdata/9.3/" start_opt="-p 5432" rep_mode="sync"
>>>>>> node_list="cl1_lb1 cl2_lb1" restore_command="cp
>>>>>> /pgtablespace/archive/%f
>>>>>> %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
>>>>>> keepalives_count=5" master_ip="172.16.0.5" restart_on_promote="false"
>>>>>> logfile="/var/log/OCF.log" \
>>>>>>            op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>            op monitor interval="4s" timeout="60s" on-fail="restart" \
>>>>>>            op monitor interval="3s" role="Master" timeout="60s"
>>>>>> on-fail="restart" \
>>>>>>            op promote interval="0s" timeout="60s" on-fail="restart" \
>>>>>>            op demote interval="0s" timeout="60s" on-fail="stop" \
>>>>>>            op stop interval="0s" timeout="60s" on-fail="block" \
>>>>>>            op notify interval="0s" timeout="60s"
>>>>>> primitive vip-master ocf:heartbeat:IPaddr2 \
>>>>>>            params ip="172.28.200.159" nic="eth0" iflabel="CBC_VIP"
>>>>>> cidr_netmask="24" \
>>>>>>            op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>            op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>>>>            op stop interval="0s" timeout="60s" on-fail="block" \
>>>>>>            meta target-role="Started"
>>>>>> primitive vip-rep ocf:heartbeat:IPaddr2 \
>>>>>>            params ip="172.16.0.5" nic="eth1" iflabel="REP_VIP"
>>>>>> cidr_netmask="24" \
>>>>>>            meta migration-threshold="0" target-role="Started" \
>>>>>>            op start interval="0s" timeout="60s" on-fail="stop" \
>>>>>>            op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>>>>            op stop interval="0s" timeout="60s" on-fail="restart"
>>>>>> group master-group vip-master vip-rep CBC_instance failover_MailTo
>>>>>> ms msPostgresql pgsql \
>>>>>>            meta master-max="1" master-node-max="1" clone-max="2"
>>>>>> clone-node-max="1" notify="true"
>>>>>> colocation rsc_colocation-1 inf: master-group msPostgresql:Master
>>>>>> order rsc_order-1 0: msPostgresql:promote master-group:start
>>>>>> symmetrical=false
>>>>>> order rsc_order-2 0: msPostgresql:demote master-group:stop
>>>>>> symmetrical=false
>>>>>> property $id="cib-bootstrap-options" \
>>>>>>            dc-version="1.1.9-2db99f1" \
>>>>>>            cluster-infrastructure="classic openais (with plugin)" \
>>>>>>            expected-quorum-votes="2" \
>>>>>>            no-quorum-policy="ignore" \
>>>>>>            stonith-enabled="false" \
>>>>>>            cluster-recheck-interval="1min" \
>>>>>>            crmd-transition-delay="0s" \
>>>>>>            last-lrm-refresh="1426485983"
>>>>>>            rsc_defaults $id="rsc-options" \
>>>>>>            resource-stickiness="INFINITY" \
>>>>>>            migration-threshold="1"
>>>>>> #vim:set syntax=pcmk
>>>>>>
>>>>>> Any ideas please, I'm lost......
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/
>>>>>> doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/
>>>>> doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/
>>>> doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>> --
>>> NTT オープンソースソフトウェアセンタ
>>> 中平 和友
>>> TEL: 03-5860-5135 FAX: 03-5463-6490
>>> Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> --
> NTT オープンソースソフトウェアセンタ
> 中平 和友
> TEL: 03-5860-5135 FAX: 03-5463-6490
> Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150316/c4f9f2cd/attachment.htm>