[ClusterLabs] Postgres streaming VIP-REP not coming up on slave

Mon Mar 16 04:54:19 EDT 2015

Hi,

 > do you suggest that I take it out? or should I look at the problem where
 > cl2_lb1 is not being promoted?

You should look at the problem where cl2_lb1 is not being promoted.
And I look it if you send me a ha-log and PostgreSQL's log.

Best regards,
Kazutomo NAKAHIRA

On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:
> Hi Nakahira,
> Thanks so much for the info, this setting was as the wiki page suggested,
> do you suggest that I take it out? or should I look at the problem where
> cl2_lb1 is not being promoted?
> Regards
>
> On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <
> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>
>> Hi,
>>
>>> Notice there is no VIPs, looks like the VIPs depends on some other
>> resource
>>> to start 1st?
>>
>> The following constraints means that "master-group" can not start
>> without master of msPostgresql resource.
>>
>> colocation rsc_colocation-1 inf: master-group msPostgresql:Master
>>
>> After you power off cl1_lb1, msPostgresql on the cl2_lb1 is not promoted
>> and master is not exist in your cluster.
>>
>> It means that "master-group" can not run anyware.
>>
>> Best regards,
>> Kazutomo NAKAHIRA
>>
>>
>> On 2015/03/16 16:48, Wynand Jansen van Vuuren wrote:
>>
>>> Hi
>>> When I start out cl1_lb1 (Cluster 1 load balancer 1) is the master as
>>> below
>>> cl1_lb1:~ # crm_mon -1 -Af
>>> Last updated: Mon Mar 16 09:44:44 2015
>>> Last change: Mon Mar 16 08:06:26 2015 by root via crm_attribute on cl1_lb1
>>> Stack: classic openais (with plugin)
>>> Current DC: cl2_lb1 - partition with quorum
>>> Version: 1.1.9-2db99f1
>>> 2 Nodes configured, 2 expected votes
>>> 6 Resources configured.
>>>
>>>
>>> Online: [ cl1_lb1 cl2_lb1 ]
>>>
>>>    Resource Group: master-group
>>>        vip-master    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>        vip-rep    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>        CBC_instance    (ocf::heartbeat:cbc):    Started cl1_lb1
>>>        failover_MailTo    (ocf::heartbeat:MailTo):    Started cl1_lb1
>>>    Master/Slave Set: msPostgresql [pgsql]
>>>        Masters: [ cl1_lb1 ]
>>>        Slaves: [ cl2_lb1 ]
>>>
>>> Node Attributes:
>>> * Node cl1_lb1:
>>>       + master-pgsql                        : 1000
>>>       + pgsql-data-status                   : LATEST
>>>       + pgsql-master-baseline               : 00000008B90061F0
>>>       + pgsql-status                        : PRI
>>> * Node cl2_lb1:
>>>       + master-pgsql                        : 100
>>>       + pgsql-data-status                   : STREAMING|SYNC
>>>       + pgsql-status                        : HS:sync
>>>
>>> Migration summary:
>>> * Node cl2_lb1:
>>> * Node cl1_lb1:
>>> cl1_lb1:~ #
>>>
>>> If I then do a power off on cl1_lb1 (master), Postgres moves to cl2_lb1
>>> (Cluster 2 load balancer 1), but the VIP-MASTER and VIP-REP is not
>>> pingable
>>> from the NEW master (cl2_lb1), it stays line this below
>>> cl2_lb1:~ # crm_mon -1 -Af
>>> Last updated: Mon Mar 16 07:32:07 2015
>>> Last change: Mon Mar 16 07:28:53 2015 by root via crm_attribute on cl1_lb1
>>> Stack: classic openais (with plugin)
>>> Current DC: cl2_lb1 - partition WITHOUT quorum
>>> Version: 1.1.9-2db99f1
>>> 2 Nodes configured, 2 expected votes
>>> 6 Resources configured.
>>>
>>>
>>> Online: [ cl2_lb1 ]
>>> OFFLINE: [ cl1_lb1 ]
>>>
>>>    Master/Slave Set: msPostgresql [pgsql]
>>>        Slaves: [ cl2_lb1 ]
>>>        Stopped: [ pgsql:1 ]
>>>
>>> Node Attributes:
>>> * Node cl2_lb1:
>>>       + master-pgsql                        : -INFINITY
>>>       + pgsql-data-status                   : DISCONNECT
>>>       + pgsql-status                        : HS:alone
>>>
>>> Migration summary:
>>> * Node cl2_lb1:
>>> cl2_lb1:~ #
>>>
>>> Notice there is no VIPs, looks like the VIPs depends on some other
>>> resource
>>> to start 1st?
>>> Thanks for the reply!
>>>
>>>
>>> On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA Kazutomo <
>>> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>
>>>   Hi,
>>>>
>>>>   fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not come
>>>>>
>>>>
>>>> cl2_lb1 acts as a slave? It is not a master?
>>>> VIPs comes up with master msPostgresql resource.
>>>>
>>>> If promote action was failed in the cl2_lb1, then
>>>> please send a ha-log and PostgreSQL's log.
>>>>
>>>> Best regards,
>>>> Kazutomo NAKAHIRA
>>>>
>>>>
>>>> On 2015/03/16 16:09, Wynand Jansen van Vuuren wrote:
>>>>
>>>>   Hi all,
>>>>>
>>>>> I have 2 nodes, with 2 interfaces each, ETH0 is used for an application,
>>>>> CBC, that's writing to the Postgres DB on the VIP-MASTER 172.28.200.159,
>>>>> ETH1 is used for the Corosync configuration and VIP-REP, everything
>>>>> works,
>>>>> but if the master currently on cl1_lb1 has a catastrophic failure, like
>>>>> power down, the VIPs does not start on the slave, the Postgres parts
>>>>> works
>>>>> fine, cl2_lb1 takes over and acts as a slave, but the VIPs does not come
>>>>> up. If I test it manually, IE kill the application 3 times on the
>>>>> master,
>>>>> the switchover is smooth, same if I kill Postgres on master, but when
>>>>> there
>>>>> is a power failure on the Master, the VIPs stay down. If I then delete
>>>>> the
>>>>> attributes pgsql-data-status="LATEST" and attributes
>>>>> pgsql-data-status="STREAMING|SYNC" on the slave after power off on the
>>>>> master and restart everything, then the VIPs come up on the slave, any
>>>>> ideas please?
>>>>> I'm using this setup
>>>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>>>>>
>>>>> With this configuration below
>>>>> node cl1_lb1 \
>>>>>            attributes pgsql-data-status="LATEST"
>>>>> node cl2_lb1 \
>>>>>            attributes pgsql-data-status="STREAMING|SYNC"
>>>>> primitive CBC_instance ocf:heartbeat:cbc \
>>>>>            op monitor interval="60s" timeout="60s" on-fail="restart" \
>>>>>            op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>            meta target-role="Started" migration-threshold="3"
>>>>> failure-timeout="60s"
>>>>> primitive failover_MailTo ocf:heartbeat:MailTo \
>>>>>            params email="wynandj at rorotika.com" subject="Cluster Status
>>>>> change
>>>>> - " \
>>>>>            op monitor interval="10" timeout="10" dept="0"
>>>>> primitive pgsql ocf:heartbeat:pgsql \
>>>>>            params pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
>>>>> psql="/opt/app/PostgreSQL/9.3/bin/psql"
>>>>> config="/opt/app/pgdata/9.3/postgresql.conf" pgdba="postgres"
>>>>> pgdata="/opt/app/pgdata/9.3/" start_opt="-p 5432" rep_mode="sync"
>>>>> node_list="cl1_lb1 cl2_lb1" restore_command="cp /pgtablespace/archive/%f
>>>>> %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
>>>>> keepalives_count=5" master_ip="172.16.0.5" restart_on_promote="false"
>>>>> logfile="/var/log/OCF.log" \
>>>>>            op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>            op monitor interval="4s" timeout="60s" on-fail="restart" \
>>>>>            op monitor interval="3s" role="Master" timeout="60s"
>>>>> on-fail="restart" \
>>>>>            op promote interval="0s" timeout="60s" on-fail="restart" \
>>>>>            op demote interval="0s" timeout="60s" on-fail="stop" \
>>>>>            op stop interval="0s" timeout="60s" on-fail="block" \
>>>>>            op notify interval="0s" timeout="60s"
>>>>> primitive vip-master ocf:heartbeat:IPaddr2 \
>>>>>            params ip="172.28.200.159" nic="eth0" iflabel="CBC_VIP"
>>>>> cidr_netmask="24" \
>>>>>            op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>            op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>>>            op stop interval="0s" timeout="60s" on-fail="block" \
>>>>>            meta target-role="Started"
>>>>> primitive vip-rep ocf:heartbeat:IPaddr2 \
>>>>>            params ip="172.16.0.5" nic="eth1" iflabel="REP_VIP"
>>>>> cidr_netmask="24" \
>>>>>            meta migration-threshold="0" target-role="Started" \
>>>>>            op start interval="0s" timeout="60s" on-fail="stop" \
>>>>>            op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>>>            op stop interval="0s" timeout="60s" on-fail="restart"
>>>>> group master-group vip-master vip-rep CBC_instance failover_MailTo
>>>>> ms msPostgresql pgsql \
>>>>>            meta master-max="1" master-node-max="1" clone-max="2"
>>>>> clone-node-max="1" notify="true"
>>>>> colocation rsc_colocation-1 inf: master-group msPostgresql:Master
>>>>> order rsc_order-1 0: msPostgresql:promote master-group:start
>>>>> symmetrical=false
>>>>> order rsc_order-2 0: msPostgresql:demote master-group:stop
>>>>> symmetrical=false
>>>>> property $id="cib-bootstrap-options" \
>>>>>            dc-version="1.1.9-2db99f1" \
>>>>>            cluster-infrastructure="classic openais (with plugin)" \
>>>>>            expected-quorum-votes="2" \
>>>>>            no-quorum-policy="ignore" \
>>>>>            stonith-enabled="false" \
>>>>>            cluster-recheck-interval="1min" \
>>>>>            crmd-transition-delay="0s" \
>>>>>            last-lrm-refresh="1426485983"
>>>>>            rsc_defaults $id="rsc-options" \
>>>>>            resource-stickiness="INFINITY" \
>>>>>            migration-threshold="1"
>>>>> #vim:set syntax=pcmk
>>>>>
>>>>> Any ideas please, I'm lost......
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/
>>>>> doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>> --
>> NTT オープンソースソフトウェアセンタ
>> 中平 和友
>> TEL: 03-5860-5135 FAX: 03-5463-6490
>> Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
NTT オープンソースソフトウェアセンタ
中平 和友
TEL: 03-5860-5135 FAX: 03-5463-6490
Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp