[ClusterLabs] 4 node, 2 cluster setup with separated applications
Wynand Jansen van Vuuren
esawyja at gmail.com
Thu Mar 19 10:18:34 UTC 2015
Hi
Yes ok a one cluster, 4 nodes configuration, when I set
symmetric-cluster=false, I get this output??!!
cl1_lb1:/opt/temp # crm_mon -1 -Af
Last updated: Thu Mar 19 12:16:20 2015
Last change: Thu Mar 19 12:15:22 2015 by hacluster via crmd on cl1_lb1
Stack: classic openais (with plugin)
Current DC: cl1_lb2 - partition with quorum
Version: 1.1.9-2db99f1
4 Nodes configured, 4 expected votes
6 Resources configured.
Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]
Node Attributes:
* Node cl1_lb1:
+ pgsql-data-status : LATEST
* Node cl1_lb2:
* Node cl2_lb1:
+ pgsql-data-status : LATEST
* Node cl2_lb2:
Migration summary:
* Node cl1_lb2:
* Node cl2_lb1:
* Node cl2_lb2:
* Node cl1_lb1:
cl1_lb1:/opt/temp #
No expanded pgsql-data-status anymore? I'm confused!
On Thu, Mar 19, 2015 at 11:47 AM, Andrei Borzenkov <arvidjaar at gmail.com>
wrote:
> On Thu, Mar 19, 2015 at 12:31 PM, Wynand Jansen van Vuuren
> <esawyja at gmail.com> wrote:
> > Hi all,
> > I have a different question please, let say I have the following
> > 4 - nodes, 2 clusters, 2 nodes per cluster, so I have in the west of the
> > country Cluster 1 with cl1_lb1 and cl1_lb2 as the nodes, in the east of
> the
> > country I have Cluster 2 with cl2_lb1 and cl2_lb2 as the nodes
> >
>
> According to output you provided you have single cluster consisting of
> 4 nodes, not two clusters of 2 nodes each.
>
> > I have 3 different applications, Postgres, App1 and App2, App1 uses a
> VIP to
> > write to Postgres, App2 uses Apache2
> >
> > Can I do the following
> > cl1_lb1, runs Postgres streaming with App1 VIP in Master/Slave
> configuration
> > to cl2_lb1
> >
> > cl1_lb1, cl1_lb2, cl2_lb1 and cl2_lb2 all runs App2 and the VIP round
> robin
> > for the Apache page
> >
> > So my question is actually this, in this configuration, in the
> corosync.conf
> > file, what would the expected_votes setting be, 2 or 4? and can you
> separate
> > the resources per node? I thought the node_list, rep_mode="sync"
> > node_list="cl1_lb1 cl2_lb1" in the pgsql primitive would isolate the
> pgsql
> > to run on cl1_lb1 and cl2_lb1 only, but it does not seem to be the case,
> as
> > soon as I add the other nodes to the corosync configuration, I get this
> > below
> >
> > cl1_lb1:/opt/temp # crm_mon -1 -Af
> > Last updated: Thu Mar 19 11:29:16 2015
> > Last change: Thu Mar 19 11:10:17 2015 by hacluster via crmd on cl1_lb1
> > Stack: classic openais (with plugin)
> > Current DC: cl1_lb1 - partition with quorum
> > Version: 1.1.9-2db99f1
> > 4 Nodes configured, 4 expected votes
> > 6 Resources configured.
> >
> >
> > Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]
> >
> >
> > Node Attributes:
> > * Node cl1_lb1:
> > + master-pgsql : -INFINITY
> > + pgsql-data-status : LATEST
> > + pgsql-status : STOP
> > * Node cl1_lb2:
> > + pgsql-status : UNKNOWN
> > * Node cl2_lb1:
> > + master-pgsql : -INFINITY
> > + pgsql-data-status : LATEST
> > + pgsql-status : STOP
> > * Node cl2_lb2:
> > + pgsql-status : UNKNOWN
> >
> > Migration summary:
> > * Node cl2_lb1:
> > pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu
> Mar
> > 19 11:10:18 2015'
> > * Node cl1_lb1:
> > pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu
> Mar
> > 19 11:10:18 2015'
> > * Node cl2_lb2:
> > * Node cl1_lb2:
> >
> > Failed actions:
> > pgsql_start_0 (node=cl2_lb1, call=561, rc=1, status=complete):
> unknown
> > error
> > pgsql_start_0 (node=cl1_lb1, call=292, rc=1, status=complete):
> unknown
> > error
> > pgsql_start_0 (node=cl2_lb2, call=115, rc=5, status=complete): not
> > installed
> > pgsql_start_0 (node=cl1_lb2, call=73, rc=5, status=complete): not
> > installed
> > cl1_lb1:/opt/temp #
> >
> > Any suggestions on how I can achieve this please ?
> >
>
> But it does exactly what you want - posgres won't be started on nodes
> cl1_lb2, cl2_lb2. If you want to get rid of probing errors, you need
> to either install postgres on all nodes (so agents do not fail) or set
> symmetric-cluster=false.
>
> > Regards
> >
> >
> >
> > On Wed, Mar 18, 2015 at 7:32 AM, Wynand Jansen van Vuuren
> > <esawyja at gmail.com> wrote:
> >>
> >> Hi
> >> Yes the problem was solved, it was the Linux Kernel that started
> Postgres
> >> when the failed server came up again, I disabled the automatic start
> with
> >> chkconfig and that solved the problem, I will take out 172.16.0.5 from
> the
> >> conf file,
> >> THANKS SO MUCH for all the help, I will do a blog post on how this is
> done
> >> on SLES 11 SP3 and Postgres 9.3 and will post the URL for the group, in
> case
> >> it will help someone out there, thanks again for all the help!
> >> Regards
> >>
> >> On Wed, Mar 18, 2015 at 3:58 AM, NAKAHIRA Kazutomo
> >> <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>
> >>> Hi,
> >>>
> >>> As Brestan pointed out, old master can not come up as a slave is
> expected
> >>> feature.
> >>>
> >>> BTW, this action is different from the original problem.
> >>> It seems from logs, promote action succeeded in the cl2_lb1 after power
> >>> off cl1_lb1.
> >>> Was the original problem resolved?
> >>>
> >>> And cl2_lb1's postgresql.conf has the following problem.
> >>>
> >>> 2015-03-17 07:34:28 SAST DETAIL: The failed archive command was: cp
> >>> pg_xlog/0000001D00000008000000C2
> >>> 172.16.0.5:/pgtablespace/archive/0000001D00000008000000C2
> >>>
> >>> "172.16.0.5" must be eliminated from the archive_command directive in
> the
> >>> postgresql.conf.
> >>>
> >>> Best regards,
> >>> Kazutomo NAKAHIRA
> >>>
> >>> On 2015/03/18 5:00, Rainer Brestan wrote:
> >>>>
> >>>> Yes, thats the expected behaviour.
> >>>> Takatoshi Matsuo describes in his papers, why a former master cant
> come
> >>>> up as
> >>>> slave without possible data corruption.
> >>>> And you do not get an indication from Postgres that the data on disk
> is
> >>>> corrupted.
> >>>> Therefore, he created the lock file mechanism to prevent a former
> master
> >>>> to
> >>>> start up.
> >>>> Making the base backup from Master discards any possibly wrong data
> from
> >>>> the
> >>>> slave and the removed lock files indicates this for the resource
> agent.
> >>>> To shorten the discussion about "how this can be automated within the
> >>>> resource
> >>>> agent", there is no clean way of handling this with very large
> >>>> databases, for
> >>>> which this can take hours.
> >>>> And what you should do is making the base backup in a temporary
> >>>> directory and
> >>>> then renaming this to the name Postgres instance requires after base
> >>>> backup
> >>>> finish successful (yes, this requires twice of harddisk space).
> >>>> Otherwise you
> >>>> might loose everything, when your master brakes during base backup.
> >>>> Rainer
> >>>> *Gesendet:* Dienstag, 17. März 2015 um 12:16 Uhr
> >>>> *Von:* "Wynand Jansen van Vuuren" <esawyja at gmail.com>
> >>>> *An:* "Cluster Labs - All topics related to open-source clustering
> >>>> welcomed"
> >>>> <users at clusterlabs.org>
> >>>> *Betreff:* Re: [ClusterLabs] Postgres streaming VIP-REP not coming up
> on
> >>>> slave
> >>>>
> >>>> Hi
> >>>> Ok I found this particular problem, when the failed node comes up
> again,
> >>>> the
> >>>> kernel start Postgres, I have disabled this and now the VIPs and
> >>>> Postgres remain
> >>>> on the new MASTER, but the failed node does not come up as a slave, IE
> >>>> there is
> >>>> no sync between the new master and slave, is this the expected
> behavior?
> >>>> The
> >>>> only way I can get it back into slave mode is to follow the procedure
> in
> >>>> the wiki
> >>>>
> >>>> # su - postgres
> >>>> $ rm -rf /var/lib/pgsql/data/
> >>>> $ pg_basebackup -h 192.168.2.3 -U postgres -D /var/lib/pgsql/data -X
> >>>> stream -P
> >>>> $ rm /var/lib/pgsql/tmp/PGSQL.lock
> >>>> $ exit
> >>>> # pcs resource cleanup msPostgresql
> >>>>
> >>>> Looking forward to your reply please
> >>>> Regards
> >>>> On Tue, Mar 17, 2015 at 7:55 AM, Wynand Jansen van Vuuren
> >>>> <esawyja at gmail.com>
> >>>> wrote:
> >>>>
> >>>> Hi Nakahira,
> >>>> I finally got around testing this, below is the initial state
> >>>>
> >>>> cl1_lb1:~ # crm_mon -1 -Af
> >>>> Last updated: Tue Mar 17 07:31:58 2015
> >>>> Last change: Tue Mar 17 07:31:12 2015 by root via crm_attribute
> on
> >>>> cl1_lb1
> >>>> Stack: classic openais (with plugin)
> >>>> Current DC: cl1_lb1 - partition with quorum
> >>>> Version: 1.1.9-2db99f1
> >>>> 2 Nodes configured, 2 expected votes
> >>>> 6 Resources configured.
> >>>>
> >>>>
> >>>> Online: [ cl1_lb1 cl2_lb1 ]
> >>>>
> >>>> Resource Group: master-group
> >>>> vip-master (ocf::heartbeat:IPaddr2): Started cl1_lb1
> >>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl1_lb1
> >>>> CBC_instance (ocf::heartbeat:cbc): Started cl1_lb1
> >>>> failover_MailTo (ocf::heartbeat:MailTo): Started
> >>>> cl1_lb1
> >>>> Master/Slave Set: msPostgresql [pgsql]
> >>>> Masters: [ cl1_lb1 ]
> >>>> Slaves: [ cl2_lb1 ]
> >>>>
> >>>> Node Attributes:
> >>>> * Node cl1_lb1:
> >>>> + master-pgsql : 1000
> >>>> + pgsql-data-status : LATEST
> >>>> + pgsql-master-baseline : 00000008BE000000
> >>>> + pgsql-status : PRI
> >>>> * Node cl2_lb1:
> >>>> + master-pgsql : 100
> >>>> + pgsql-data-status : STREAMING|SYNC
> >>>> + pgsql-status : HS:sync
> >>>>
> >>>> Migration summary:
> >>>> * Node cl2_lb1:
> >>>> * Node cl1_lb1:
> >>>> cl1_lb1:~ #
> >>>> ###### - I then did a init 0 on the master node, cl1_lb1
> >>>>
> >>>> cl1_lb1:~ # init 0
> >>>> cl1_lb1:~ #
> >>>> Connection closed by foreign host.
> >>>>
> >>>> Disconnected from remote host(cl1_lb1) at 07:36:18.
> >>>>
> >>>> Type `help' to learn how to use Xshell prompt.
> >>>> [c:\~]$
> >>>> ###### - This was ok as the slave took over, became master
> >>>>
> >>>> cl2_lb1:~ # crm_mon -1 -Af
> >>>> Last updated: Tue Mar 17 07:35:04 2015
> >>>> Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute
> on
> >>>> cl2_lb1
> >>>> Stack: classic openais (with plugin)
> >>>> Current DC: cl2_lb1 - partition WITHOUT quorum
> >>>> Version: 1.1.9-2db99f1
> >>>> 2 Nodes configured, 2 expected votes
> >>>> 6 Resources configured.
> >>>>
> >>>>
> >>>> Online: [ cl2_lb1 ]
> >>>> OFFLINE: [ cl1_lb1 ]
> >>>>
> >>>> Resource Group: master-group
> >>>> vip-master (ocf::heartbeat:IPaddr2): Started cl2_lb1
> >>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl2_lb1
> >>>> CBC_instance (ocf::heartbeat:cbc): Started cl2_lb1
> >>>> failover_MailTo (ocf::heartbeat:MailTo): Started
> >>>> cl2_lb1
> >>>> Master/Slave Set: msPostgresql [pgsql]
> >>>> Masters: [ cl2_lb1 ]
> >>>> Stopped: [ pgsql:1 ]
> >>>>
> >>>> Node Attributes:
> >>>> * Node cl2_lb1:
> >>>> + master-pgsql : 1000
> >>>> + pgsql-data-status : LATEST
> >>>> + pgsql-master-baseline : 00000008C2000090
> >>>> + pgsql-status : PRI
> >>>>
> >>>> Migration summary:
> >>>> * Node cl2_lb1:
> >>>> cl2_lb1:~ #
> >>>> And the logs from Postgres and Corosync are attached
> >>>> ###### - I then restarted the original Master cl1_lb1 and started
> >>>> Corosync
> >>>> manually
> >>>> Once the original Master cl1_lb1 was up and Corosync running, the
> >>>> status
> >>>> below happened, notice no VIPs and Postgres
> >>>> ###### - Still working below
> >>>>
> >>>> cl2_lb1:~ # crm_mon -1 -Af
> >>>> Last updated: Tue Mar 17 07:36:55 2015
> >>>> Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute
> on
> >>>> cl2_lb1
> >>>> Stack: classic openais (with plugin)
> >>>> Current DC: cl2_lb1 - partition WITHOUT quorum
> >>>> Version: 1.1.9-2db99f1
> >>>> 2 Nodes configured, 2 expected votes
> >>>> 6 Resources configured.
> >>>>
> >>>>
> >>>> Online: [ cl2_lb1 ]
> >>>> OFFLINE: [ cl1_lb1 ]
> >>>>
> >>>> Resource Group: master-group
> >>>> vip-master (ocf::heartbeat:IPaddr2): Started cl2_lb1
> >>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl2_lb1
> >>>> CBC_instance (ocf::heartbeat:cbc): Started cl2_lb1
> >>>> failover_MailTo (ocf::heartbeat:MailTo): Started
> >>>> cl2_lb1
> >>>> Master/Slave Set: msPostgresql [pgsql]
> >>>> Masters: [ cl2_lb1 ]
> >>>> Stopped: [ pgsql:1 ]
> >>>>
> >>>> Node Attributes:
> >>>> * Node cl2_lb1:
> >>>> + master-pgsql : 1000
> >>>> + pgsql-data-status : LATEST
> >>>> + pgsql-master-baseline : 00000008C2000090
> >>>> + pgsql-status : PRI
> >>>>
> >>>> Migration summary:
> >>>> * Node cl2_lb1:
> >>>>
> >>>> ###### - After original master is up and Corosync running on
> >>>> cl1_lb1
> >>>>
> >>>> cl2_lb1:~ # crm_mon -1 -Af
> >>>> Last updated: Tue Mar 17 07:37:47 2015
> >>>> Last change: Tue Mar 17 07:37:21 2015 by root via crm_attribute
> on
> >>>> cl1_lb1
> >>>> Stack: classic openais (with plugin)
> >>>> Current DC: cl2_lb1 - partition with quorum
> >>>> Version: 1.1.9-2db99f1
> >>>> 2 Nodes configured, 2 expected votes
> >>>> 6 Resources configured.
> >>>>
> >>>>
> >>>> Online: [ cl1_lb1 cl2_lb1 ]
> >>>>
> >>>>
> >>>> Node Attributes:
> >>>> * Node cl1_lb1:
> >>>> + master-pgsql : -INFINITY
> >>>> + pgsql-data-status : LATEST
> >>>> + pgsql-status : STOP
> >>>> * Node cl2_lb1:
> >>>> + master-pgsql : -INFINITY
> >>>> + pgsql-data-status : DISCONNECT
> >>>> + pgsql-status : STOP
> >>>>
> >>>> Migration summary:
> >>>> * Node cl2_lb1:
> >>>> pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
> >>>> Mar 17
> >>>> 07:37:26 2015'
> >>>> * Node cl1_lb1:
> >>>> pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
> >>>> Mar 17
> >>>> 07:37:26 2015'
> >>>>
> >>>> Failed actions:
> >>>> pgsql_monitor_4000 (node=cl2_lb1, call=735, rc=7,
> >>>> status=complete): not
> >>>> running
> >>>> pgsql_monitor_4000 (node=cl1_lb1, call=42, rc=7,
> >>>> status=complete): not
> >>>> running
> >>>> cl2_lb1:~ #
> >>>> ##### - No VIPs up
> >>>>
> >>>> cl2_lb1:~ # ping 172.28.200.159
> >>>> PING 172.28.200.159 (172.28.200.159) 56(84) bytes of data.
> >>>> >From 172.28.200.168 <http://172.28.200.168>: icmp_seq=1
> >>>> Destination Host
> >>>> Unreachable
> >>>> >From 172.28.200.168 icmp_seq=1 Destination Host Unreachable
> >>>> >From 172.28.200.168 icmp_seq=2 Destination Host Unreachable
> >>>> >From 172.28.200.168 icmp_seq=3 Destination Host Unreachable
> >>>> ^C
> >>>> --- 172.28.200.159 ping statistics ---
> >>>> 5 packets transmitted, 0 received, +4 errors, 100% packet loss,
> >>>> time 4024ms
> >>>> , pipe 3
> >>>> cl2_lb1:~ # ping 172.16.0.5
> >>>> PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.
> >>>> >From 172.16.0.3 <http://172.16.0.3>: icmp_seq=1 Destination
> Host
> >>>> Unreachable
> >>>>
> >>>> >From 172.16.0.3 icmp_seq=1 Destination Host Unreachable
> >>>> >From 172.16.0.3 icmp_seq=2 Destination Host Unreachable
> >>>> >From 172.16.0.3 icmp_seq=3 Destination Host Unreachable
> >>>> >From 172.16.0.3 icmp_seq=5 Destination Host Unreachable
> >>>> >From 172.16.0.3 icmp_seq=6 Destination Host Unreachable
> >>>> >From 172.16.0.3 icmp_seq=7 Destination Host Unreachable
> >>>> ^C
> >>>> --- 172.16.0.5 ping statistics ---
> >>>> 8 packets transmitted, 0 received, +7 errors, 100% packet loss,
> >>>> time 7015ms
> >>>> , pipe 3
> >>>> cl2_lb1:~ #
> >>>>
> >>>> Any ideas please, or it it a case of recovering the original
> master
> >>>> manually
> >>>> before starting Corosync etc?
> >>>> All logs are attached
> >>>> Regards
> >>>> On Mon, Mar 16, 2015 at 11:01 AM, Wynand Jansen van Vuuren
> >>>> <esawyja at gmail.com> wrote:
> >>>>
> >>>> Thanks for the advice, I have a demo on this now, so I don't
> >>>> want to
> >>>> test this now, I will do so tomorrow and forwards the logs,
> >>>> many thanks!!
> >>>> On Mon, Mar 16, 2015 at 10:54 AM, NAKAHIRA Kazutomo
> >>>> <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> > do you suggest that I take it out? or should I look at
> >>>> the problem where
> >>>> > cl2_lb1 is not being promoted?
> >>>>
> >>>> You should look at the problem where cl2_lb1 is not being
> >>>> promoted.
> >>>> And I look it if you send me a ha-log and PostgreSQL's
> log.
> >>>>
> >>>> Best regards,
> >>>> Kazutomo NAKAHIRA
> >>>>
> >>>>
> >>>> On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:
> >>>>
> >>>> Hi Nakahira,
> >>>> Thanks so much for the info, this setting was as the
> >>>> wiki page
> >>>> suggested,
> >>>> do you suggest that I take it out? or should I look
> at
> >>>> the
> >>>> problem where
> >>>> cl2_lb1 is not being promoted?
> >>>> Regards
> >>>>
> >>>> On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <
> >>>> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Notice there is no VIPs, looks like the VIPs
> >>>> depends on
> >>>> some other
> >>>>
> >>>> resource
> >>>>
> >>>> to start 1st?
> >>>>
> >>>>
> >>>> The following constraints means that
> "master-group"
> >>>> can not
> >>>> start
> >>>> without master of msPostgresql resource.
> >>>>
> >>>> colocation rsc_colocation-1 inf: master-group
> >>>> msPostgresql:Master
> >>>>
> >>>> After you power off cl1_lb1, msPostgresql on the
> >>>> cl2_lb1 is
> >>>> not promoted
> >>>> and master is not exist in your cluster.
> >>>>
> >>>> It means that "master-group" can not run anyware.
> >>>>
> >>>> Best regards,
> >>>> Kazutomo NAKAHIRA
> >>>>
> >>>>
> >>>> On 2015/03/16 16:48, Wynand Jansen van Vuuren
> >>>> wrote:
> >>>>
> >>>> Hi
> >>>> When I start out cl1_lb1 (Cluster 1 load
> >>>> balancer 1) is
> >>>> the master as
> >>>> below
> >>>> cl1_lb1:~ # crm_mon -1 -Af
> >>>> Last updated: Mon Mar 16 09:44:44 2015
> >>>> Last change: Mon Mar 16 08:06:26 2015 by root
> >>>> via
> >>>> crm_attribute on cl1_lb1
> >>>> Stack: classic openais (with plugin)
> >>>> Current DC: cl2_lb1 - partition with quorum
> >>>> Version: 1.1.9-2db99f1
> >>>> 2 Nodes configured, 2 expected votes
> >>>> 6 Resources configured.
> >>>>
> >>>>
> >>>> Online: [ cl1_lb1 cl2_lb1 ]
> >>>>
> >>>> Resource Group: master-group
> >>>> vip-master
> (ocf::heartbeat:IPaddr2):
> >>>> Started cl1_lb1
> >>>> vip-rep (ocf::heartbeat:IPaddr2):
> >>>> Started
> >>>> cl1_lb1
> >>>> CBC_instance (ocf::heartbeat:cbc):
> >>>> Started
> >>>> cl1_lb1
> >>>> failover_MailTo
> >>>> (ocf::heartbeat:MailTo):
> >>>> Started cl1_lb1
> >>>> Master/Slave Set: msPostgresql [pgsql]
> >>>> Masters: [ cl1_lb1 ]
> >>>> Slaves: [ cl2_lb1 ]
> >>>>
> >>>> Node Attributes:
> >>>> * Node cl1_lb1:
> >>>> + master-pgsql
> :
> >>>> 1000
> >>>> + pgsql-data-status
> :
> >>>> LATEST
> >>>> + pgsql-master-baseline
> :
> >>>> 00000008B90061F0
> >>>> + pgsql-status
> :
> >>>> PRI
> >>>> * Node cl2_lb1:
> >>>> + master-pgsql
> :
> >>>> 100
> >>>> + pgsql-data-status
> :
> >>>> STREAMING|SYNC
> >>>> + pgsql-status
> :
> >>>> HS:sync
> >>>>
> >>>> Migration summary:
> >>>> * Node cl2_lb1:
> >>>> * Node cl1_lb1:
> >>>> cl1_lb1:~ #
> >>>>
> >>>> If I then do a power off on cl1_lb1 (master),
> >>>> Postgres
> >>>> moves to cl2_lb1
> >>>> (Cluster 2 load balancer 1), but the
> VIP-MASTER
> >>>> and
> >>>> VIP-REP is not
> >>>> pingable
> >>>> from the NEW master (cl2_lb1), it stays line
> >>>> this below
> >>>> cl2_lb1:~ # crm_mon -1 -Af
> >>>> Last updated: Mon Mar 16 07:32:07 2015
> >>>> Last change: Mon Mar 16 07:28:53 2015 by root
> >>>> via
> >>>> crm_attribute on cl1_lb1
> >>>> Stack: classic openais (with plugin)
> >>>> Current DC: cl2_lb1 - partition WITHOUT
> quorum
> >>>> Version: 1.1.9-2db99f1
> >>>> 2 Nodes configured, 2 expected votes
> >>>> 6 Resources configured.
> >>>>
> >>>>
> >>>> Online: [ cl2_lb1 ]
> >>>> OFFLINE: [ cl1_lb1 ]
> >>>>
> >>>> Master/Slave Set: msPostgresql [pgsql]
> >>>> Slaves: [ cl2_lb1 ]
> >>>> Stopped: [ pgsql:1 ]
> >>>>
> >>>> Node Attributes:
> >>>> * Node cl2_lb1:
> >>>> + master-pgsql
> :
> >>>> -INFINITY
> >>>> + pgsql-data-status
> :
> >>>> DISCONNECT
> >>>> + pgsql-status
> :
> >>>> HS:alone
> >>>>
> >>>> Migration summary:
> >>>> * Node cl2_lb1:
> >>>> cl2_lb1:~ #
> >>>>
> >>>> Notice there is no VIPs, looks like the VIPs
> >>>> depends on
> >>>> some other
> >>>> resource
> >>>> to start 1st?
> >>>> Thanks for the reply!
> >>>>
> >>>>
> >>>> On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA
> >>>> Kazutomo <
> >>>> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> fine, cl2_lb1 takes over and acts as a
> >>>> slave, but
> >>>> the VIPs does not come
> >>>>
> >>>>
> >>>> cl2_lb1 acts as a slave? It is not a
> >>>> master?
> >>>> VIPs comes up with master msPostgresql
> >>>> resource.
> >>>>
> >>>> If promote action was failed in the
> >>>> cl2_lb1, then
> >>>> please send a ha-log and PostgreSQL's
> log.
> >>>>
> >>>> Best regards,
> >>>> Kazutomo NAKAHIRA
> >>>>
> >>>>
> >>>> On 2015/03/16 16:09, Wynand Jansen van
> >>>> Vuuren wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>>
> >>>> I have 2 nodes, with 2 interfaces
> each,
> >>>> ETH0 is
> >>>> used for an application,
> >>>> CBC, that's writing to the Postgres
> DB
> >>>> on the
> >>>> VIP-MASTER 172.28.200.159,
> >>>> ETH1 is used for the Corosync
> >>>> configuration and
> >>>> VIP-REP, everything
> >>>> works,
> >>>> but if the master currently on
> cl1_lb1
> >>>> has a
> >>>> catastrophic failure, like
> >>>> power down, the VIPs does not start
> on
> >>>> the
> >>>> slave, the Postgres parts
> >>>> works
> >>>> fine, cl2_lb1 takes over and acts as
> a
> >>>> slave,
> >>>> but the VIPs does not come
> >>>> up. If I test it manually, IE kill
> the
> >>>> application 3 times on the
> >>>> master,
> >>>> the switchover is smooth, same if I
> >>>> kill
> >>>> Postgres on master, but when
> >>>> there
> >>>> is a power failure on the Master, the
> >>>> VIPs stay
> >>>> down. If I then delete
> >>>> the
> >>>> attributes pgsql-data-status="LATEST"
> >>>> and attributes
> >>>> pgsql-data-status="STREAMING|SYNC" on
> >>>> the slave
> >>>> after power off on the
> >>>> master and restart everything, then
> the
> >>>> VIPs
> >>>> come up on the slave, any
> >>>> ideas please?
> >>>> I'm using this setup
> >>>>
> >>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
> >>>>
> >>>> With this configuration below
> >>>> node cl1_lb1 \
> >>>> attributes
> >>>> pgsql-data-status="LATEST"
> >>>> node cl2_lb1 \
> >>>> attributes
> >>>> pgsql-data-status="STREAMING|SYNC"
> >>>> primitive CBC_instance
> >>>> ocf:heartbeat:cbc \
> >>>> op monitor interval="60s"
> >>>> timeout="60s" on-fail="restart" \
> >>>> op start interval="0s"
> >>>> timeout="60s"
> >>>> on-fail="restart" \
> >>>> meta
> target-role="Started"
> >>>> migration-threshold="3"
> >>>> failure-timeout="60s"
> >>>> primitive failover_MailTo
> >>>> ocf:heartbeat:MailTo \
> >>>> params
> >>>> email="wynandj at rorotika.com"
> >>>> subject="Cluster Status
> >>>> change
> >>>> - " \
> >>>> op monitor interval="10"
> >>>> timeout="10" dept="0"
> >>>> primitive pgsql ocf:heartbeat:pgsql \
> >>>> params
> >>>>
> >>>> pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
> >>>>
> psql="/opt/app/PostgreSQL/9.3/bin/psql"
> >>>>
> >>>> config="/opt/app/pgdata/9.3/postgresql.conf"
> >>>> pgdba="postgres"
> >>>> pgdata="/opt/app/pgdata/9.3/"
> >>>> start_opt="-p
> >>>> 5432" rep_mode="sync"
> >>>> node_list="cl1_lb1 cl2_lb1"
> >>>> restore_command="cp
> >>>> /pgtablespace/archive/%f
> >>>> %p"
> >>>> primary_conninfo_opt="keepalives_idle=60
> >>>> keepalives_interval=5
> >>>> keepalives_count=5"
> >>>> master_ip="172.16.0.5"
> >>>> restart_on_promote="false"
> >>>> logfile="/var/log/OCF.log" \
> >>>> op start interval="0s"
> >>>> timeout="60s"
> >>>> on-fail="restart" \
> >>>> op monitor interval="4s"
> >>>> timeout="60s" on-fail="restart" \
> >>>> op monitor interval="3s"
> >>>> role="Master" timeout="60s"
> >>>> on-fail="restart" \
> >>>> op promote interval="0s"
> >>>> timeout="60s" on-fail="restart" \
> >>>> op demote interval="0s"
> >>>> timeout="60s" on-fail="stop" \
> >>>> op stop interval="0s"
> >>>> timeout="60s"
> >>>> on-fail="block" \
> >>>> op notify interval="0s"
> >>>> timeout="60s"
> >>>> primitive vip-master
> >>>> ocf:heartbeat:IPaddr2 \
> >>>> params
> ip="172.28.200.159"
> >>>> nic="eth0" iflabel="CBC_VIP"
> >>>> cidr_netmask="24" \
> >>>> op start interval="0s"
> >>>> timeout="60s"
> >>>> on-fail="restart" \
> >>>> op monitor interval="10s"
> >>>> timeout="60s" on-fail="restart" \
> >>>> op stop interval="0s"
> >>>> timeout="60s"
> >>>> on-fail="block" \
> >>>> meta
> target-role="Started"
> >>>> primitive vip-rep
> ocf:heartbeat:IPaddr2
> >>>> \
> >>>> params ip="172.16.0.5"
> >>>> nic="eth1"
> >>>> iflabel="REP_VIP"
> >>>> cidr_netmask="24" \
> >>>> meta
> >>>> migration-threshold="0"
> >>>> target-role="Started" \
> >>>> op start interval="0s"
> >>>> timeout="60s"
> >>>> on-fail="stop" \
> >>>> op monitor interval="10s"
> >>>> timeout="60s" on-fail="restart" \
> >>>> op stop interval="0s"
> >>>> timeout="60s"
> >>>> on-fail="restart"
> >>>> group master-group vip-master vip-rep
> >>>> CBC_instance failover_MailTo
> >>>> ms msPostgresql pgsql \
> >>>> meta master-max="1"
> >>>> master-node-max="1" clone-max="2"
> >>>> clone-node-max="1" notify="true"
> >>>> colocation rsc_colocation-1 inf:
> >>>> master-group
> >>>> msPostgresql:Master
> >>>> order rsc_order-1 0:
> >>>> msPostgresql:promote
> >>>> master-group:start
> >>>> symmetrical=false
> >>>> order rsc_order-2 0:
> >>>> msPostgresql:demote
> >>>> master-group:stop
> >>>> symmetrical=false
> >>>> property $id="cib-bootstrap-options"
> \
> >>>>
> dc-version="1.1.9-2db99f1"
> >>>> \
> >>>>
> >>>> cluster-infrastructure="classic
> >>>> openais (with plugin)" \
> >>>>
> expected-quorum-votes="2" \
> >>>>
> no-quorum-policy="ignore" \
> >>>> stonith-enabled="false" \
> >>>>
> >>>> cluster-recheck-interval="1min" \
> >>>>
> crmd-transition-delay="0s"
> >>>> \
> >>>>
> >>>> last-lrm-refresh="1426485983"
> >>>> rsc_defaults
> >>>> $id="rsc-options" \
> >>>>
> >>>> resource-stickiness="INFINITY" \
> >>>> migration-threshold="1"
> >>>> #vim:set syntax=pcmk
> >>>>
> >>>> Any ideas please, I'm lost......
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list:
> >>>> Users at clusterlabs.org
> >>>>
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home:
> >>>> http://www.clusterlabs.org
> >>>> Getting started:
> >>>> http://www.clusterlabs.org/
> >>>> doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list:
> Users at clusterlabs.org
> >>>>
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >>>>
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>>
> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org
> >>>>
> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >>>>
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>> --
> >>>> NTT オープンソースソフトウェアセンタ
> >>>> 中平 和友
> >>>> TEL: 03-5860-5135 FAX: 03-5463-6490
> >>>> Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >>>>
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >>>>
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> NTT オープンソースソフトウェアセンタ
> >>>> 中平 和友
> >>>> TEL: 03-5860-5135 FAX: 03-5463-6490
> >>>> Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>> _______________________________________________ Users mailing list:
> >>>> Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
> >>>> Project
> >>>> Home: http://www.clusterlabs.org Getting started:
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
> >>>> http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Users mailing list: Users at clusterlabs.org
> >>> http://clusterlabs.org/mailman/listinfo/users
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150319/edbfdc1c/attachment.htm>
More information about the Users
mailing list