[ClusterLabs] 4 node, 2 cluster setup with separated applications

Thu Mar 19 10:18:34 UTC 2015

Hi
Yes ok a one cluster, 4 nodes configuration, when I set
symmetric-cluster=false, I get this output??!!
cl1_lb1:/opt/temp # crm_mon -1 -Af
Last updated: Thu Mar 19 12:16:20 2015
Last change: Thu Mar 19 12:15:22 2015 by hacluster via crmd on cl1_lb1
Stack: classic openais (with plugin)
Current DC: cl1_lb2 - partition with quorum
Version: 1.1.9-2db99f1
4 Nodes configured, 4 expected votes
6 Resources configured.

Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]

Node Attributes:
* Node cl1_lb1:
    + pgsql-data-status                   : LATEST
* Node cl1_lb2:
* Node cl2_lb1:
    + pgsql-data-status                   : LATEST
* Node cl2_lb2:

Migration summary:
* Node cl1_lb2:
* Node cl2_lb1:
* Node cl2_lb2:
* Node cl1_lb1:
cl1_lb1:/opt/temp #

No expanded pgsql-data-status anymore? I'm confused!

On Thu, Mar 19, 2015 at 11:47 AM, Andrei Borzenkov <arvidjaar at gmail.com>
wrote:

> On Thu, Mar 19, 2015 at 12:31 PM, Wynand Jansen van Vuuren
> <esawyja at gmail.com> wrote:
> > Hi all,
> > I have a different question please, let say I have the following
> > 4 - nodes, 2 clusters, 2 nodes per cluster, so I have in the west of the
> > country Cluster 1 with cl1_lb1 and cl1_lb2 as the nodes, in the east of
> the
> > country I have Cluster 2 with cl2_lb1 and cl2_lb2 as the nodes
> >
>
> According to output you provided you have single cluster consisting of
> 4 nodes, not two clusters of 2 nodes each.
>
> > I have 3 different applications, Postgres, App1 and App2, App1 uses a
> VIP to
> > write to Postgres, App2 uses Apache2
> >
> > Can I do the following
> > cl1_lb1, runs Postgres streaming with App1 VIP in Master/Slave
> configuration
> > to cl2_lb1
> >
> > cl1_lb1, cl1_lb2, cl2_lb1 and cl2_lb2 all runs App2 and the VIP round
> robin
> > for the Apache page
> >
> > So my question is actually this, in this configuration, in the
> corosync.conf
> > file, what would the expected_votes setting be, 2 or 4? and can you
> separate
> > the resources per node? I thought the node_list, rep_mode="sync"
> > node_list="cl1_lb1 cl2_lb1" in the pgsql primitive would isolate the
> pgsql
> > to run on cl1_lb1 and cl2_lb1 only, but it does not seem to be the case,
> as
> > soon as I add the other nodes to the corosync configuration, I get this
> > below
> >
> > cl1_lb1:/opt/temp # crm_mon -1 -Af
> > Last updated: Thu Mar 19 11:29:16 2015
> > Last change: Thu Mar 19 11:10:17 2015 by hacluster via crmd on cl1_lb1
> > Stack: classic openais (with plugin)
> > Current DC: cl1_lb1 - partition with quorum
> > Version: 1.1.9-2db99f1
> > 4 Nodes configured, 4 expected votes
> > 6 Resources configured.
> >
> >
> > Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]
> >
> >
> > Node Attributes:
> > * Node cl1_lb1:
> >     + master-pgsql                        : -INFINITY
> >     + pgsql-data-status                   : LATEST
> >     + pgsql-status                        : STOP
> > * Node cl1_lb2:
> >     + pgsql-status                        : UNKNOWN
> > * Node cl2_lb1:
> >     + master-pgsql                        : -INFINITY
> >     + pgsql-data-status                   : LATEST
> >     + pgsql-status                        : STOP
> > * Node cl2_lb2:
> >     + pgsql-status                        : UNKNOWN
> >
> > Migration summary:
> > * Node cl2_lb1:
> >    pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu
> Mar
> > 19 11:10:18 2015'
> > * Node cl1_lb1:
> >    pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu
> Mar
> > 19 11:10:18 2015'
> > * Node cl2_lb2:
> > * Node cl1_lb2:
> >
> > Failed actions:
> >     pgsql_start_0 (node=cl2_lb1, call=561, rc=1, status=complete):
> unknown
> > error
> >     pgsql_start_0 (node=cl1_lb1, call=292, rc=1, status=complete):
> unknown
> > error
> >     pgsql_start_0 (node=cl2_lb2, call=115, rc=5, status=complete): not
> > installed
> >     pgsql_start_0 (node=cl1_lb2, call=73, rc=5, status=complete): not
> > installed
> > cl1_lb1:/opt/temp #
> >
> > Any suggestions on how I can achieve this please ?
> >
>
> But it does exactly what you want - posgres won't be started on nodes
> cl1_lb2, cl2_lb2. If you want to get rid of probing errors, you need
> to either install postgres on all nodes (so agents do not fail) or set
> symmetric-cluster=false.
>
> > Regards
> >
> >
> >
> > On Wed, Mar 18, 2015 at 7:32 AM, Wynand Jansen van Vuuren
> > <esawyja at gmail.com> wrote:
> >>
> >> Hi
> >> Yes the problem was solved, it was the Linux Kernel that started
> Postgres
> >> when the failed server came up again, I disabled the automatic start
> with
> >> chkconfig and that solved the problem, I will take out 172.16.0.5 from
> the
> >> conf file,
> >> THANKS SO MUCH for all the help, I will do a blog post on how this is
> done
> >> on SLES 11 SP3 and Postgres 9.3 and will post the URL for the group, in
> case
> >> it will help someone out there, thanks again for all the help!
> >> Regards
> >>
> >> On Wed, Mar 18, 2015 at 3:58 AM, NAKAHIRA Kazutomo
> >> <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>
> >>> Hi,
> >>>
> >>> As Brestan pointed out, old master can not come up as a slave is
> expected
> >>> feature.
> >>>
> >>> BTW, this action is different from the original problem.
> >>> It seems from logs, promote action succeeded in the cl2_lb1 after power
> >>> off cl1_lb1.
> >>> Was the original problem resolved?
> >>>
> >>> And cl2_lb1's postgresql.conf has the following problem.
> >>>
> >>> 2015-03-17 07:34:28 SAST DETAIL:  The failed archive command was: cp
> >>> pg_xlog/0000001D00000008000000C2
> >>> 172.16.0.5:/pgtablespace/archive/0000001D00000008000000C2
> >>>
> >>> "172.16.0.5" must be eliminated from the archive_command directive in
> the
> >>> postgresql.conf.
> >>>
> >>> Best regards,
> >>> Kazutomo NAKAHIRA
> >>>
> >>> On 2015/03/18 5:00, Rainer Brestan wrote:
> >>>>
> >>>> Yes, thats the expected behaviour.
> >>>> Takatoshi Matsuo describes in his papers, why a former master cant
> come
> >>>> up as
> >>>> slave without possible data corruption.
> >>>> And you do not get an indication from Postgres that the data on disk
> is
> >>>> corrupted.
> >>>> Therefore, he created the lock file mechanism to prevent a former
> master
> >>>> to
> >>>> start up.
> >>>> Making the base backup from Master discards any possibly wrong data
> from
> >>>> the
> >>>> slave and the removed lock files indicates this for the resource
> agent.
> >>>> To shorten the discussion about "how this can be automated within the
> >>>> resource
> >>>> agent", there is no clean way of handling this with very large
> >>>> databases, for
> >>>> which this can take hours.
> >>>> And what you should do is making the base backup in a temporary
> >>>> directory and
> >>>> then renaming this to the name Postgres instance requires after base
> >>>> backup
> >>>> finish successful (yes, this requires twice of harddisk space).
> >>>> Otherwise you
> >>>> might loose everything, when your master brakes during base backup.
> >>>> Rainer
> >>>> *Gesendet:* Dienstag, 17. März 2015 um 12:16 Uhr
> >>>> *Von:* "Wynand Jansen van Vuuren" <esawyja at gmail.com>
> >>>> *An:* "Cluster Labs - All topics related to open-source clustering
> >>>> welcomed"
> >>>> <users at clusterlabs.org>
> >>>> *Betreff:* Re: [ClusterLabs] Postgres streaming VIP-REP not coming up
> on
> >>>> slave
> >>>>
> >>>> Hi
> >>>> Ok I found this particular problem, when the failed node comes up
> again,
> >>>> the
> >>>> kernel start Postgres, I have disabled this and now the VIPs and
> >>>> Postgres remain
> >>>> on the new MASTER, but the failed node does not come up as a slave, IE
> >>>> there is
> >>>> no sync between the new master and slave, is this the expected
> behavior?
> >>>> The
> >>>> only way I can get it back into slave mode is to follow the procedure
> in
> >>>> the wiki
> >>>>
> >>>> # su - postgres
> >>>> $ rm -rf /var/lib/pgsql/data/
> >>>> $ pg_basebackup -h 192.168.2.3 -U postgres -D /var/lib/pgsql/data -X
> >>>> stream -P
> >>>> $ rm /var/lib/pgsql/tmp/PGSQL.lock
> >>>> $ exit
> >>>> # pcs resource cleanup msPostgresql
> >>>>
> >>>> Looking forward to your reply please
> >>>> Regards
> >>>> On Tue, Mar 17, 2015 at 7:55 AM, Wynand Jansen van Vuuren
> >>>> <esawyja at gmail.com>
> >>>> wrote:
> >>>>
> >>>>      Hi Nakahira,
> >>>>      I finally got around testing this, below is the initial state
> >>>>
> >>>>      cl1_lb1:~ # crm_mon -1 -Af
> >>>>      Last updated: Tue Mar 17 07:31:58 2015
> >>>>      Last change: Tue Mar 17 07:31:12 2015 by root via crm_attribute
> on
> >>>> cl1_lb1
> >>>>      Stack: classic openais (with plugin)
> >>>>      Current DC: cl1_lb1 - partition with quorum
> >>>>      Version: 1.1.9-2db99f1
> >>>>      2 Nodes configured, 2 expected votes
> >>>>      6 Resources configured.
> >>>>
> >>>>
> >>>>      Online: [ cl1_lb1 cl2_lb1 ]
> >>>>
> >>>>        Resource Group: master-group
> >>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
> >>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
> >>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl1_lb1
> >>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started
> >>>> cl1_lb1
> >>>>        Master/Slave Set: msPostgresql [pgsql]
> >>>>            Masters: [ cl1_lb1 ]
> >>>>            Slaves: [ cl2_lb1 ]
> >>>>
> >>>>      Node Attributes:
> >>>>      * Node cl1_lb1:
> >>>>           + master-pgsql                        : 1000
> >>>>           + pgsql-data-status                   : LATEST
> >>>>           + pgsql-master-baseline               : 00000008BE000000
> >>>>           + pgsql-status                        : PRI
> >>>>      * Node cl2_lb1:
> >>>>           + master-pgsql                        : 100
> >>>>           + pgsql-data-status                   : STREAMING|SYNC
> >>>>           + pgsql-status                        : HS:sync
> >>>>
> >>>>      Migration summary:
> >>>>      * Node cl2_lb1:
> >>>>      * Node cl1_lb1:
> >>>>      cl1_lb1:~ #
> >>>>      ###### -  I then did a init 0 on the master node, cl1_lb1
> >>>>
> >>>>      cl1_lb1:~ # init 0
> >>>>      cl1_lb1:~ #
> >>>>      Connection closed by foreign host.
> >>>>
> >>>>      Disconnected from remote host(cl1_lb1) at 07:36:18.
> >>>>
> >>>>      Type `help' to learn how to use Xshell prompt.
> >>>>      [c:\~]$
> >>>>      ###### - This was ok as the slave took over, became master
> >>>>
> >>>>      cl2_lb1:~ # crm_mon -1 -Af
> >>>>      Last updated: Tue Mar 17 07:35:04 2015
> >>>>      Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute
> on
> >>>> cl2_lb1
> >>>>      Stack: classic openais (with plugin)
> >>>>      Current DC: cl2_lb1 - partition WITHOUT quorum
> >>>>      Version: 1.1.9-2db99f1
> >>>>      2 Nodes configured, 2 expected votes
> >>>>      6 Resources configured.
> >>>>
> >>>>
> >>>>      Online: [ cl2_lb1 ]
> >>>>      OFFLINE: [ cl1_lb1 ]
> >>>>
> >>>>        Resource Group: master-group
> >>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
> >>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
> >>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl2_lb1
> >>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started
> >>>> cl2_lb1
> >>>>        Master/Slave Set: msPostgresql [pgsql]
> >>>>            Masters: [ cl2_lb1 ]
> >>>>            Stopped: [ pgsql:1 ]
> >>>>
> >>>>      Node Attributes:
> >>>>      * Node cl2_lb1:
> >>>>           + master-pgsql                        : 1000
> >>>>           + pgsql-data-status                   : LATEST
> >>>>           + pgsql-master-baseline               : 00000008C2000090
> >>>>           + pgsql-status                        : PRI
> >>>>
> >>>>      Migration summary:
> >>>>      * Node cl2_lb1:
> >>>>      cl2_lb1:~ #
> >>>>      And the logs from Postgres and Corosync are attached
> >>>>      ###### - I then restarted the original Master cl1_lb1 and started
> >>>> Corosync
> >>>>      manually
> >>>>      Once the original Master cl1_lb1 was up and Corosync running, the
> >>>> status
> >>>>      below happened, notice no VIPs and Postgres
> >>>>      ###### - Still working below
> >>>>
> >>>>      cl2_lb1:~ # crm_mon -1 -Af
> >>>>      Last updated: Tue Mar 17 07:36:55 2015
> >>>>      Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute
> on
> >>>> cl2_lb1
> >>>>      Stack: classic openais (with plugin)
> >>>>      Current DC: cl2_lb1 - partition WITHOUT quorum
> >>>>      Version: 1.1.9-2db99f1
> >>>>      2 Nodes configured, 2 expected votes
> >>>>      6 Resources configured.
> >>>>
> >>>>
> >>>>      Online: [ cl2_lb1 ]
> >>>>      OFFLINE: [ cl1_lb1 ]
> >>>>
> >>>>        Resource Group: master-group
> >>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
> >>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
> >>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl2_lb1
> >>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started
> >>>> cl2_lb1
> >>>>        Master/Slave Set: msPostgresql [pgsql]
> >>>>            Masters: [ cl2_lb1 ]
> >>>>            Stopped: [ pgsql:1 ]
> >>>>
> >>>>      Node Attributes:
> >>>>      * Node cl2_lb1:
> >>>>           + master-pgsql                        : 1000
> >>>>           + pgsql-data-status                   : LATEST
> >>>>           + pgsql-master-baseline               : 00000008C2000090
> >>>>           + pgsql-status                        : PRI
> >>>>
> >>>>      Migration summary:
> >>>>      * Node cl2_lb1:
> >>>>
> >>>>      ###### - After original master is up and Corosync running on
> >>>> cl1_lb1
> >>>>
> >>>>      cl2_lb1:~ # crm_mon -1 -Af
> >>>>      Last updated: Tue Mar 17 07:37:47 2015
> >>>>      Last change: Tue Mar 17 07:37:21 2015 by root via crm_attribute
> on
> >>>> cl1_lb1
> >>>>      Stack: classic openais (with plugin)
> >>>>      Current DC: cl2_lb1 - partition with quorum
> >>>>      Version: 1.1.9-2db99f1
> >>>>      2 Nodes configured, 2 expected votes
> >>>>      6 Resources configured.
> >>>>
> >>>>
> >>>>      Online: [ cl1_lb1 cl2_lb1 ]
> >>>>
> >>>>
> >>>>      Node Attributes:
> >>>>      * Node cl1_lb1:
> >>>>           + master-pgsql                        : -INFINITY
> >>>>           + pgsql-data-status                   : LATEST
> >>>>           + pgsql-status                        : STOP
> >>>>      * Node cl2_lb1:
> >>>>           + master-pgsql                        : -INFINITY
> >>>>           + pgsql-data-status                   : DISCONNECT
> >>>>           + pgsql-status                        : STOP
> >>>>
> >>>>      Migration summary:
> >>>>      * Node cl2_lb1:
> >>>>          pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
> >>>> Mar 17
> >>>>      07:37:26 2015'
> >>>>      * Node cl1_lb1:
> >>>>          pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
> >>>> Mar 17
> >>>>      07:37:26 2015'
> >>>>
> >>>>      Failed actions:
> >>>>           pgsql_monitor_4000 (node=cl2_lb1, call=735, rc=7,
> >>>> status=complete): not
> >>>>      running
> >>>>           pgsql_monitor_4000 (node=cl1_lb1, call=42, rc=7,
> >>>> status=complete): not
> >>>>      running
> >>>>      cl2_lb1:~ #
> >>>>      ##### - No VIPs up
> >>>>
> >>>>      cl2_lb1:~ # ping 172.28.200.159
> >>>>      PING 172.28.200.159 (172.28.200.159) 56(84) bytes of data.
> >>>>       >From 172.28.200.168 <http://172.28.200.168>: icmp_seq=1
> >>>> Destination Host
> >>>>      Unreachable
> >>>>       >From 172.28.200.168 icmp_seq=1 Destination Host Unreachable
> >>>>       >From 172.28.200.168 icmp_seq=2 Destination Host Unreachable
> >>>>       >From 172.28.200.168 icmp_seq=3 Destination Host Unreachable
> >>>>      ^C
> >>>>      --- 172.28.200.159 ping statistics ---
> >>>>      5 packets transmitted, 0 received, +4 errors, 100% packet loss,
> >>>> time 4024ms
> >>>>      , pipe 3
> >>>>      cl2_lb1:~ # ping 172.16.0.5
> >>>>      PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.
> >>>>       >From 172.16.0.3 <http://172.16.0.3>: icmp_seq=1 Destination
> Host
> >>>> Unreachable
> >>>>
> >>>>       >From 172.16.0.3 icmp_seq=1 Destination Host Unreachable
> >>>>       >From 172.16.0.3 icmp_seq=2 Destination Host Unreachable
> >>>>       >From 172.16.0.3 icmp_seq=3 Destination Host Unreachable
> >>>>       >From 172.16.0.3 icmp_seq=5 Destination Host Unreachable
> >>>>       >From 172.16.0.3 icmp_seq=6 Destination Host Unreachable
> >>>>       >From 172.16.0.3 icmp_seq=7 Destination Host Unreachable
> >>>>      ^C
> >>>>      --- 172.16.0.5 ping statistics ---
> >>>>      8 packets transmitted, 0 received, +7 errors, 100% packet loss,
> >>>> time 7015ms
> >>>>      , pipe 3
> >>>>      cl2_lb1:~ #
> >>>>
> >>>>      Any ideas please, or it it a case of recovering the original
> master
> >>>> manually
> >>>>      before starting Corosync etc?
> >>>>      All logs are attached
> >>>>      Regards
> >>>>      On Mon, Mar 16, 2015 at 11:01 AM, Wynand Jansen van Vuuren
> >>>>      <esawyja at gmail.com> wrote:
> >>>>
> >>>>          Thanks for the advice, I have a demo on this now, so I don't
> >>>> want to
> >>>>          test this now, I will do so tomorrow and forwards the logs,
> >>>> many thanks!!
> >>>>          On Mon, Mar 16, 2015 at 10:54 AM, NAKAHIRA Kazutomo
> >>>>          <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>>
> >>>>              Hi,
> >>>>
> >>>>              > do you suggest that I take it out? or should I look at
> >>>> the problem where
> >>>>              > cl2_lb1 is not being promoted?
> >>>>
> >>>>              You should look at the problem where cl2_lb1 is not being
> >>>> promoted.
> >>>>              And I look it if you send me a ha-log and PostgreSQL's
> log.
> >>>>
> >>>>              Best regards,
> >>>>              Kazutomo NAKAHIRA
> >>>>
> >>>>
> >>>>              On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:
> >>>>
> >>>>                  Hi Nakahira,
> >>>>                  Thanks so much for the info, this setting was as the
> >>>> wiki page
> >>>>                  suggested,
> >>>>                  do you suggest that I take it out? or should I look
> at
> >>>> the
> >>>>                  problem where
> >>>>                  cl2_lb1 is not being promoted?
> >>>>                  Regards
> >>>>
> >>>>                  On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <
> >>>>                  nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>>
> >>>>                      Hi,
> >>>>
> >>>>                          Notice there is no VIPs, looks like the VIPs
> >>>> depends on
> >>>>                          some other
> >>>>
> >>>>                      resource
> >>>>
> >>>>                          to start 1st?
> >>>>
> >>>>
> >>>>                      The following constraints means that
> "master-group"
> >>>> can not
> >>>>                      start
> >>>>                      without master of msPostgresql resource.
> >>>>
> >>>>                      colocation rsc_colocation-1 inf: master-group
> >>>>                      msPostgresql:Master
> >>>>
> >>>>                      After you power off cl1_lb1, msPostgresql on the
> >>>> cl2_lb1 is
> >>>>                      not promoted
> >>>>                      and master is not exist in your cluster.
> >>>>
> >>>>                      It means that "master-group" can not run anyware.
> >>>>
> >>>>                      Best regards,
> >>>>                      Kazutomo NAKAHIRA
> >>>>
> >>>>
> >>>>                      On 2015/03/16 16:48, Wynand Jansen van Vuuren
> >>>> wrote:
> >>>>
> >>>>                          Hi
> >>>>                          When I start out cl1_lb1 (Cluster 1 load
> >>>> balancer 1) is
> >>>>                          the master as
> >>>>                          below
> >>>>                          cl1_lb1:~ # crm_mon -1 -Af
> >>>>                          Last updated: Mon Mar 16 09:44:44 2015
> >>>>                          Last change: Mon Mar 16 08:06:26 2015 by root
> >>>> via
> >>>>                          crm_attribute on cl1_lb1
> >>>>                          Stack: classic openais (with plugin)
> >>>>                          Current DC: cl2_lb1 - partition with quorum
> >>>>                          Version: 1.1.9-2db99f1
> >>>>                          2 Nodes configured, 2 expected votes
> >>>>                          6 Resources configured.
> >>>>
> >>>>
> >>>>                          Online: [ cl1_lb1 cl2_lb1 ]
> >>>>
> >>>>                              Resource Group: master-group
> >>>>                                  vip-master
> (ocf::heartbeat:IPaddr2):
> >>>>                          Started cl1_lb1
> >>>>                                  vip-rep    (ocf::heartbeat:IPaddr2):
> >>>> Started
> >>>>                          cl1_lb1
> >>>>                                  CBC_instance    (ocf::heartbeat:cbc):
> >>>> Started
> >>>>                          cl1_lb1
> >>>>                                  failover_MailTo
> >>>> (ocf::heartbeat:MailTo):
> >>>>                          Started cl1_lb1
> >>>>                              Master/Slave Set: msPostgresql [pgsql]
> >>>>                                  Masters: [ cl1_lb1 ]
> >>>>                                  Slaves: [ cl2_lb1 ]
> >>>>
> >>>>                          Node Attributes:
> >>>>                          * Node cl1_lb1:
> >>>>                                 + master-pgsql
> :
> >>>> 1000
> >>>>                                 + pgsql-data-status
>  :
> >>>> LATEST
> >>>>                                 + pgsql-master-baseline
>  :
> >>>>                          00000008B90061F0
> >>>>                                 + pgsql-status
> :
> >>>> PRI
> >>>>                          * Node cl2_lb1:
> >>>>                                 + master-pgsql
> :
> >>>> 100
> >>>>                                 + pgsql-data-status
>  :
> >>>>                          STREAMING|SYNC
> >>>>                                 + pgsql-status
> :
> >>>> HS:sync
> >>>>
> >>>>                          Migration summary:
> >>>>                          * Node cl2_lb1:
> >>>>                          * Node cl1_lb1:
> >>>>                          cl1_lb1:~ #
> >>>>
> >>>>                          If I then do a power off on cl1_lb1 (master),
> >>>> Postgres
> >>>>                          moves to cl2_lb1
> >>>>                          (Cluster 2 load balancer 1), but the
> VIP-MASTER
> >>>> and
> >>>>                          VIP-REP is not
> >>>>                          pingable
> >>>>                          from the NEW master (cl2_lb1), it stays line
> >>>> this below
> >>>>                          cl2_lb1:~ # crm_mon -1 -Af
> >>>>                          Last updated: Mon Mar 16 07:32:07 2015
> >>>>                          Last change: Mon Mar 16 07:28:53 2015 by root
> >>>> via
> >>>>                          crm_attribute on cl1_lb1
> >>>>                          Stack: classic openais (with plugin)
> >>>>                          Current DC: cl2_lb1 - partition WITHOUT
> quorum
> >>>>                          Version: 1.1.9-2db99f1
> >>>>                          2 Nodes configured, 2 expected votes
> >>>>                          6 Resources configured.
> >>>>
> >>>>
> >>>>                          Online: [ cl2_lb1 ]
> >>>>                          OFFLINE: [ cl1_lb1 ]
> >>>>
> >>>>                              Master/Slave Set: msPostgresql [pgsql]
> >>>>                                  Slaves: [ cl2_lb1 ]
> >>>>                                  Stopped: [ pgsql:1 ]
> >>>>
> >>>>                          Node Attributes:
> >>>>                          * Node cl2_lb1:
> >>>>                                 + master-pgsql
> :
> >>>> -INFINITY
> >>>>                                 + pgsql-data-status
>  :
> >>>> DISCONNECT
> >>>>                                 + pgsql-status
> :
> >>>> HS:alone
> >>>>
> >>>>                          Migration summary:
> >>>>                          * Node cl2_lb1:
> >>>>                          cl2_lb1:~ #
> >>>>
> >>>>                          Notice there is no VIPs, looks like the VIPs
> >>>> depends on
> >>>>                          some other
> >>>>                          resource
> >>>>                          to start 1st?
> >>>>                          Thanks for the reply!
> >>>>
> >>>>
> >>>>                          On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA
> >>>> Kazutomo <
> >>>>                          nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
> >>>>
> >>>>                             Hi,
> >>>>
> >>>>
> >>>>                                 fine, cl2_lb1 takes over and acts as a
> >>>> slave, but
> >>>>                              the VIPs does not come
> >>>>
> >>>>
> >>>>                              cl2_lb1 acts as a slave? It is not a
> >>>> master?
> >>>>                              VIPs comes up with master msPostgresql
> >>>> resource.
> >>>>
> >>>>                              If promote action was failed in the
> >>>> cl2_lb1, then
> >>>>                              please send a ha-log and PostgreSQL's
> log.
> >>>>
> >>>>                              Best regards,
> >>>>                              Kazutomo NAKAHIRA
> >>>>
> >>>>
> >>>>                              On 2015/03/16 16:09, Wynand Jansen van
> >>>> Vuuren wrote:
> >>>>
> >>>>                                 Hi all,
> >>>>
> >>>>
> >>>>                                  I have 2 nodes, with 2 interfaces
> each,
> >>>> ETH0 is
> >>>>                                  used for an application,
> >>>>                                  CBC, that's writing to the Postgres
> DB
> >>>> on the
> >>>>                                  VIP-MASTER 172.28.200.159,
> >>>>                                  ETH1 is used for the Corosync
> >>>> configuration and
> >>>>                                  VIP-REP, everything
> >>>>                                  works,
> >>>>                                  but if the master currently on
> cl1_lb1
> >>>> has a
> >>>>                                  catastrophic failure, like
> >>>>                                  power down, the VIPs does not start
> on
> >>>> the
> >>>>                                  slave, the Postgres parts
> >>>>                                  works
> >>>>                                  fine, cl2_lb1 takes over and acts as
> a
> >>>> slave,
> >>>>                                  but the VIPs does not come
> >>>>                                  up. If I test it manually, IE kill
> the
> >>>>                                  application 3 times on the
> >>>>                                  master,
> >>>>                                  the switchover is smooth, same if I
> >>>> kill
> >>>>                                  Postgres on master, but when
> >>>>                                  there
> >>>>                                  is a power failure on the Master, the
> >>>> VIPs stay
> >>>>                                  down. If I then delete
> >>>>                                  the
> >>>>                                  attributes pgsql-data-status="LATEST"
> >>>> and attributes
> >>>>                                  pgsql-data-status="STREAMING|SYNC" on
> >>>> the slave
> >>>>                                  after power off on the
> >>>>                                  master and restart everything, then
> the
> >>>> VIPs
> >>>>                                  come up on the slave, any
> >>>>                                  ideas please?
> >>>>                                  I'm using this setup
> >>>>
> >>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
> >>>>
> >>>>                                  With this configuration below
> >>>>                                  node cl1_lb1 \
> >>>>                                              attributes
> >>>> pgsql-data-status="LATEST"
> >>>>                                  node cl2_lb1 \
> >>>>                                              attributes
> >>>>                                  pgsql-data-status="STREAMING|SYNC"
> >>>>                                  primitive CBC_instance
> >>>> ocf:heartbeat:cbc \
> >>>>                                              op monitor interval="60s"
> >>>>                                  timeout="60s" on-fail="restart" \
> >>>>                                              op start interval="0s"
> >>>> timeout="60s"
> >>>>                                  on-fail="restart" \
> >>>>                                              meta
> target-role="Started"
> >>>>                                  migration-threshold="3"
> >>>>                                  failure-timeout="60s"
> >>>>                                  primitive failover_MailTo
> >>>> ocf:heartbeat:MailTo \
> >>>>                                              params
> >>>> email="wynandj at rorotika.com"
> >>>>                                  subject="Cluster Status
> >>>>                                  change
> >>>>                                  - " \
> >>>>                                              op monitor interval="10"
> >>>>                                  timeout="10" dept="0"
> >>>>                                  primitive pgsql ocf:heartbeat:pgsql \
> >>>>                                              params
> >>>>
> >>>> pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
> >>>>
> psql="/opt/app/PostgreSQL/9.3/bin/psql"
> >>>>
> >>>> config="/opt/app/pgdata/9.3/postgresql.conf"
> >>>>                                  pgdba="postgres"
> >>>>                                  pgdata="/opt/app/pgdata/9.3/"
> >>>> start_opt="-p
> >>>>                                  5432" rep_mode="sync"
> >>>>                                  node_list="cl1_lb1 cl2_lb1"
> >>>> restore_command="cp
> >>>>                                  /pgtablespace/archive/%f
> >>>>                                  %p"
> >>>> primary_conninfo_opt="keepalives_idle=60
> >>>>                                  keepalives_interval=5
> >>>>                                  keepalives_count=5"
> >>>> master_ip="172.16.0.5"
> >>>>                                  restart_on_promote="false"
> >>>>                                  logfile="/var/log/OCF.log" \
> >>>>                                              op start interval="0s"
> >>>> timeout="60s"
> >>>>                                  on-fail="restart" \
> >>>>                                              op monitor interval="4s"
> >>>>                                  timeout="60s" on-fail="restart" \
> >>>>                                              op monitor interval="3s"
> >>>>                                  role="Master" timeout="60s"
> >>>>                                  on-fail="restart" \
> >>>>                                              op promote interval="0s"
> >>>>                                  timeout="60s" on-fail="restart" \
> >>>>                                              op demote interval="0s"
> >>>>                                  timeout="60s" on-fail="stop" \
> >>>>                                              op stop interval="0s"
> >>>> timeout="60s"
> >>>>                                  on-fail="block" \
> >>>>                                              op notify interval="0s"
> >>>> timeout="60s"
> >>>>                                  primitive vip-master
> >>>> ocf:heartbeat:IPaddr2 \
> >>>>                                              params
> ip="172.28.200.159"
> >>>>                                  nic="eth0" iflabel="CBC_VIP"
> >>>>                                  cidr_netmask="24" \
> >>>>                                              op start interval="0s"
> >>>> timeout="60s"
> >>>>                                  on-fail="restart" \
> >>>>                                              op monitor interval="10s"
> >>>>                                  timeout="60s" on-fail="restart" \
> >>>>                                              op stop interval="0s"
> >>>> timeout="60s"
> >>>>                                  on-fail="block" \
> >>>>                                              meta
> target-role="Started"
> >>>>                                  primitive vip-rep
> ocf:heartbeat:IPaddr2
> >>>> \
> >>>>                                              params ip="172.16.0.5"
> >>>> nic="eth1"
> >>>>                                  iflabel="REP_VIP"
> >>>>                                  cidr_netmask="24" \
> >>>>                                              meta
> >>>> migration-threshold="0"
> >>>>                                  target-role="Started" \
> >>>>                                              op start interval="0s"
> >>>> timeout="60s"
> >>>>                                  on-fail="stop" \
> >>>>                                              op monitor interval="10s"
> >>>>                                  timeout="60s" on-fail="restart" \
> >>>>                                              op stop interval="0s"
> >>>> timeout="60s"
> >>>>                                  on-fail="restart"
> >>>>                                  group master-group vip-master vip-rep
> >>>>                                  CBC_instance failover_MailTo
> >>>>                                  ms msPostgresql pgsql \
> >>>>                                              meta master-max="1"
> >>>>                                  master-node-max="1" clone-max="2"
> >>>>                                  clone-node-max="1" notify="true"
> >>>>                                  colocation rsc_colocation-1 inf:
> >>>> master-group
> >>>>                                  msPostgresql:Master
> >>>>                                  order rsc_order-1 0:
> >>>> msPostgresql:promote
> >>>>                                  master-group:start
> >>>>                                  symmetrical=false
> >>>>                                  order rsc_order-2 0:
> >>>> msPostgresql:demote
> >>>>                                  master-group:stop
> >>>>                                  symmetrical=false
> >>>>                                  property $id="cib-bootstrap-options"
> \
> >>>>
> dc-version="1.1.9-2db99f1"
> >>>> \
> >>>>
> >>>> cluster-infrastructure="classic
> >>>>                                  openais (with plugin)" \
> >>>>
> expected-quorum-votes="2" \
> >>>>
> no-quorum-policy="ignore" \
> >>>>                                              stonith-enabled="false" \
> >>>>
> >>>> cluster-recheck-interval="1min" \
> >>>>
> crmd-transition-delay="0s"
> >>>> \
> >>>>
> >>>> last-lrm-refresh="1426485983"
> >>>>                                              rsc_defaults
> >>>> $id="rsc-options" \
> >>>>
> >>>> resource-stickiness="INFINITY" \
> >>>>                                              migration-threshold="1"
> >>>>                                  #vim:set syntax=pcmk
> >>>>
> >>>>                                  Any ideas please, I'm lost......
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>>                                  Users mailing list:
> >>>> Users at clusterlabs.org
> >>>>
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>>                                  Project Home:
> >>>> http://www.clusterlabs.org
> >>>>                                  Getting started:
> >>>> http://www.clusterlabs.org/
> >>>>                                  doc/Cluster_from_Scratch.pdf
> >>>>                                  Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>>                              Users mailing list:
> Users at clusterlabs.org
> >>>>
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>>                              Project Home: http://www.clusterlabs.org
> >>>>                              Getting started:
> >>>>
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>                              Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>>
> _______________________________________________
> >>>>                          Users mailing list: Users at clusterlabs.org
> >>>>
> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>>                          Project Home: http://www.clusterlabs.org
> >>>>                          Getting started:
> >>>>
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>                          Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>                      --
> >>>>                      NTT オープンソースソフトウェアセンタ
> >>>>                      中平 和友
> >>>>                      TEL: 03-5860-5135 FAX: 03-5463-6490
> >>>>                      Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
> >>>>
> >>>>
> >>>>
> >>>>                      _______________________________________________
> >>>>                      Users mailing list: Users at clusterlabs.org
> >>>>                      http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>>                      Project Home: http://www.clusterlabs.org
> >>>>                      Getting started:
> >>>>
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>                      Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>                  _______________________________________________
> >>>>                  Users mailing list: Users at clusterlabs.org
> >>>>                  http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>>                  Project Home: http://www.clusterlabs.org
> >>>>                  Getting started:
> >>>>
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>                  Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>>              --
> >>>>              NTT オープンソースソフトウェアセンタ
> >>>>              中平 和友
> >>>>              TEL: 03-5860-5135 FAX: 03-5463-6490
> >>>>              Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
> >>>>
> >>>>
> >>>>              _______________________________________________
> >>>>              Users mailing list: Users at clusterlabs.org
> >>>>              http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>>              Project Home: http://www.clusterlabs.org
> >>>>              Getting started:
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>              Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>> _______________________________________________ Users mailing list:
> >>>> Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
> >>>> Project
> >>>> Home: http://www.clusterlabs.org Getting started:
> >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
> >>>> http://bugs.clusterlabs.org
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list: Users at clusterlabs.org
> >>>> http://clusterlabs.org/mailman/listinfo/users
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Users mailing list: Users at clusterlabs.org
> >>> http://clusterlabs.org/mailman/listinfo/users
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150319/edbfdc1c/attachment.htm>