[ClusterLabs] 4 node, 2 cluster setup with separated applications

Thu Mar 19 05:31:59 EDT 2015

Hi all,
I have a different question please, let say I have the following
4 - nodes, 2 clusters, 2 nodes per cluster, so I have in the west of the
country Cluster 1 with cl1_lb1 and cl1_lb2 as the nodes, in the east of the
country I have Cluster 2 with cl2_lb1 and cl2_lb2 as the nodes

I have 3 different applications, Postgres, App1 and App2, App1 uses a VIP
to write to Postgres, App2 uses Apache2

Can I do the following
cl1_lb1, runs Postgres streaming with App1 VIP in Master/Slave
configuration to cl2_lb1

cl1_lb1, cl1_lb2, cl2_lb1 and cl2_lb2 all runs App2 and the VIP round robin
for the Apache page

So my question is actually this, in this configuration, in the
corosync.conf file, what would the expected_votes setting be, 2 or 4? and
can you separate the resources per node? I thought the node_list,
rep_mode="sync" node_list="cl1_lb1 cl2_lb1" in the pgsql primitive would
isolate the pgsql to run on cl1_lb1 and cl2_lb1 only, but it does not seem
to be the case, as soon as I add the other nodes to the corosync
configuration, I get this below

cl1_lb1:/opt/temp # crm_mon -1 -Af
Last updated: Thu Mar 19 11:29:16 2015
Last change: Thu Mar 19 11:10:17 2015 by hacluster via crmd on cl1_lb1
Stack: classic openais (with plugin)
Current DC: cl1_lb1 - partition with quorum
Version: 1.1.9-2db99f1
4 Nodes configured, 4 expected votes
6 Resources configured.

Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]

Node Attributes:
* Node cl1_lb1:
    + master-pgsql                        : -INFINITY
    + pgsql-data-status                   : LATEST
    + pgsql-status                        : STOP
* Node cl1_lb2:
    + pgsql-status                        : UNKNOWN
* Node cl2_lb1:
    + master-pgsql                        : -INFINITY
    + pgsql-data-status                   : LATEST
    + pgsql-status                        : STOP
* Node cl2_lb2:
    + pgsql-status                        : UNKNOWN

Migration summary:
* Node cl2_lb1:
   pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar
19 11:10:18 2015'
* Node cl1_lb1:
   pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar
19 11:10:18 2015'
* Node cl2_lb2:
* Node cl1_lb2:

Failed actions:
    pgsql_start_0 (node=cl2_lb1, call=561, rc=1, status=complete): unknown
error
    pgsql_start_0 (node=cl1_lb1, call=292, rc=1, status=complete): unknown
error
    pgsql_start_0 (node=cl2_lb2, call=115, rc=5, status=complete): not
installed
    pgsql_start_0 (node=cl1_lb2, call=73, rc=5, status=complete): not
installed
cl1_lb1:/opt/temp #

Any suggestions on how I can achieve this please ?

Regards

On Wed, Mar 18, 2015 at 7:32 AM, Wynand Jansen van Vuuren <esawyja at gmail.com
> wrote:

> Hi
> Yes the problem was solved, it was the Linux Kernel that started Postgres
> when the failed server came up again, I disabled the automatic start with
> chkconfig and that solved the problem, I will take out 172.16.0.5 from the
> conf file,
> THANKS SO MUCH for all the help, I will do a blog post on how this is done
> on SLES 11 SP3 and Postgres 9.3 and will post the URL for the group, in
> case it will help someone out there, thanks again for all the help!
> Regards
>
> On Wed, Mar 18, 2015 at 3:58 AM, NAKAHIRA Kazutomo <
> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>
>> Hi,
>>
>> As Brestan pointed out, old master can not come up as a slave is expected
>> feature.
>>
>> BTW, this action is different from the original problem.
>> It seems from logs, promote action succeeded in the cl2_lb1 after power
>> off cl1_lb1.
>> Was the original problem resolved?
>>
>> And cl2_lb1's postgresql.conf has the following problem.
>>
>> 2015-03-17 07:34:28 SAST DETAIL:  The failed archive command was: cp
>> pg_xlog/0000001D00000008000000C2 172.16.0.5:/pgtablespace/archive/
>> 0000001D00000008000000C2
>>
>> "172.16.0.5" must be eliminated from the archive_command directive in the
>> postgresql.conf.
>>
>> Best regards,
>> Kazutomo NAKAHIRA
>>
>> On 2015/03/18 5:00, Rainer Brestan wrote:
>>
>>> Yes, thats the expected behaviour.
>>> Takatoshi Matsuo describes in his papers, why a former master cant come
>>> up as
>>> slave without possible data corruption.
>>> And you do not get an indication from Postgres that the data on disk is
>>> corrupted.
>>> Therefore, he created the lock file mechanism to prevent a former master
>>> to
>>> start up.
>>> Making the base backup from Master discards any possibly wrong data from
>>> the
>>> slave and the removed lock files indicates this for the resource agent.
>>> To shorten the discussion about "how this can be automated within the
>>> resource
>>> agent", there is no clean way of handling this with very large
>>> databases, for
>>> which this can take hours.
>>> And what you should do is making the base backup in a temporary
>>> directory and
>>> then renaming this to the name Postgres instance requires after base
>>> backup
>>> finish successful (yes, this requires twice of harddisk space).
>>> Otherwise you
>>> might loose everything, when your master brakes during base backup.
>>> Rainer
>>> *Gesendet:* Dienstag, 17. März 2015 um 12:16 Uhr
>>> *Von:* "Wynand Jansen van Vuuren" <esawyja at gmail.com>
>>> *An:* "Cluster Labs - All topics related to open-source clustering
>>> welcomed"
>>> <users at clusterlabs.org>
>>> *Betreff:* Re: [ClusterLabs] Postgres streaming VIP-REP not coming up on
>>> slave
>>>
>>> Hi
>>> Ok I found this particular problem, when the failed node comes up again,
>>> the
>>> kernel start Postgres, I have disabled this and now the VIPs and
>>> Postgres remain
>>> on the new MASTER, but the failed node does not come up as a slave, IE
>>> there is
>>> no sync between the new master and slave, is this the expected behavior?
>>> The
>>> only way I can get it back into slave mode is to follow the procedure in
>>> the wiki
>>>
>>> # su - postgres
>>> $ rm -rf /var/lib/pgsql/data/
>>> $ pg_basebackup -h 192.168.2.3 -U postgres -D /var/lib/pgsql/data -X
>>> stream -P
>>> $ rm /var/lib/pgsql/tmp/PGSQL.lock
>>> $ exit
>>> # pcs resource cleanup msPostgresql
>>>
>>> Looking forward to your reply please
>>> Regards
>>> On Tue, Mar 17, 2015 at 7:55 AM, Wynand Jansen van Vuuren <
>>> esawyja at gmail.com>
>>> wrote:
>>>
>>>      Hi Nakahira,
>>>      I finally got around testing this, below is the initial state
>>>
>>>      cl1_lb1:~ # crm_mon -1 -Af
>>>      Last updated: Tue Mar 17 07:31:58 2015
>>>      Last change: Tue Mar 17 07:31:12 2015 by root via crm_attribute on
>>> cl1_lb1
>>>      Stack: classic openais (with plugin)
>>>      Current DC: cl1_lb1 - partition with quorum
>>>      Version: 1.1.9-2db99f1
>>>      2 Nodes configured, 2 expected votes
>>>      6 Resources configured.
>>>
>>>
>>>      Online: [ cl1_lb1 cl2_lb1 ]
>>>
>>>        Resource Group: master-group
>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl1_lb1
>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started cl1_lb1
>>>        Master/Slave Set: msPostgresql [pgsql]
>>>            Masters: [ cl1_lb1 ]
>>>            Slaves: [ cl2_lb1 ]
>>>
>>>      Node Attributes:
>>>      * Node cl1_lb1:
>>>           + master-pgsql                        : 1000
>>>           + pgsql-data-status                   : LATEST
>>>           + pgsql-master-baseline               : 00000008BE000000
>>>           + pgsql-status                        : PRI
>>>      * Node cl2_lb1:
>>>           + master-pgsql                        : 100
>>>           + pgsql-data-status                   : STREAMING|SYNC
>>>           + pgsql-status                        : HS:sync
>>>
>>>      Migration summary:
>>>      * Node cl2_lb1:
>>>      * Node cl1_lb1:
>>>      cl1_lb1:~ #
>>>      ###### -  I then did a init 0 on the master node, cl1_lb1
>>>
>>>      cl1_lb1:~ # init 0
>>>      cl1_lb1:~ #
>>>      Connection closed by foreign host.
>>>
>>>      Disconnected from remote host(cl1_lb1) at 07:36:18.
>>>
>>>      Type `help' to learn how to use Xshell prompt.
>>>      [c:\~]$
>>>      ###### - This was ok as the slave took over, became master
>>>
>>>      cl2_lb1:~ # crm_mon -1 -Af
>>>      Last updated: Tue Mar 17 07:35:04 2015
>>>      Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on
>>> cl2_lb1
>>>      Stack: classic openais (with plugin)
>>>      Current DC: cl2_lb1 - partition WITHOUT quorum
>>>      Version: 1.1.9-2db99f1
>>>      2 Nodes configured, 2 expected votes
>>>      6 Resources configured.
>>>
>>>
>>>      Online: [ cl2_lb1 ]
>>>      OFFLINE: [ cl1_lb1 ]
>>>
>>>        Resource Group: master-group
>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl2_lb1
>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started cl2_lb1
>>>        Master/Slave Set: msPostgresql [pgsql]
>>>            Masters: [ cl2_lb1 ]
>>>            Stopped: [ pgsql:1 ]
>>>
>>>      Node Attributes:
>>>      * Node cl2_lb1:
>>>           + master-pgsql                        : 1000
>>>           + pgsql-data-status                   : LATEST
>>>           + pgsql-master-baseline               : 00000008C2000090
>>>           + pgsql-status                        : PRI
>>>
>>>      Migration summary:
>>>      * Node cl2_lb1:
>>>      cl2_lb1:~ #
>>>      And the logs from Postgres and Corosync are attached
>>>      ###### - I then restarted the original Master cl1_lb1 and started
>>> Corosync
>>>      manually
>>>      Once the original Master cl1_lb1 was up and Corosync running, the
>>> status
>>>      below happened, notice no VIPs and Postgres
>>>      ###### - Still working below
>>>
>>>      cl2_lb1:~ # crm_mon -1 -Af
>>>      Last updated: Tue Mar 17 07:36:55 2015
>>>      Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on
>>> cl2_lb1
>>>      Stack: classic openais (with plugin)
>>>      Current DC: cl2_lb1 - partition WITHOUT quorum
>>>      Version: 1.1.9-2db99f1
>>>      2 Nodes configured, 2 expected votes
>>>      6 Resources configured.
>>>
>>>
>>>      Online: [ cl2_lb1 ]
>>>      OFFLINE: [ cl1_lb1 ]
>>>
>>>        Resource Group: master-group
>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl2_lb1
>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started cl2_lb1
>>>        Master/Slave Set: msPostgresql [pgsql]
>>>            Masters: [ cl2_lb1 ]
>>>            Stopped: [ pgsql:1 ]
>>>
>>>      Node Attributes:
>>>      * Node cl2_lb1:
>>>           + master-pgsql                        : 1000
>>>           + pgsql-data-status                   : LATEST
>>>           + pgsql-master-baseline               : 00000008C2000090
>>>           + pgsql-status                        : PRI
>>>
>>>      Migration summary:
>>>      * Node cl2_lb1:
>>>
>>>      ###### - After original master is up and Corosync running on cl1_lb1
>>>
>>>      cl2_lb1:~ # crm_mon -1 -Af
>>>      Last updated: Tue Mar 17 07:37:47 2015
>>>      Last change: Tue Mar 17 07:37:21 2015 by root via crm_attribute on
>>> cl1_lb1
>>>      Stack: classic openais (with plugin)
>>>      Current DC: cl2_lb1 - partition with quorum
>>>      Version: 1.1.9-2db99f1
>>>      2 Nodes configured, 2 expected votes
>>>      6 Resources configured.
>>>
>>>
>>>      Online: [ cl1_lb1 cl2_lb1 ]
>>>
>>>
>>>      Node Attributes:
>>>      * Node cl1_lb1:
>>>           + master-pgsql                        : -INFINITY
>>>           + pgsql-data-status                   : LATEST
>>>           + pgsql-status                        : STOP
>>>      * Node cl2_lb1:
>>>           + master-pgsql                        : -INFINITY
>>>           + pgsql-data-status                   : DISCONNECT
>>>           + pgsql-status                        : STOP
>>>
>>>      Migration summary:
>>>      * Node cl2_lb1:
>>>          pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
>>> Mar 17
>>>      07:37:26 2015'
>>>      * Node cl1_lb1:
>>>          pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
>>> Mar 17
>>>      07:37:26 2015'
>>>
>>>      Failed actions:
>>>           pgsql_monitor_4000 (node=cl2_lb1, call=735, rc=7,
>>> status=complete): not
>>>      running
>>>           pgsql_monitor_4000 (node=cl1_lb1, call=42, rc=7,
>>> status=complete): not
>>>      running
>>>      cl2_lb1:~ #
>>>      ##### - No VIPs up
>>>
>>>      cl2_lb1:~ # ping 172.28.200.159
>>>      PING 172.28.200.159 (172.28.200.159) 56(84) bytes of data.
>>>       >From 172.28.200.168 <http://172.28.200.168>: icmp_seq=1
>>> Destination Host
>>>      Unreachable
>>>       >From 172.28.200.168 icmp_seq=1 Destination Host Unreachable
>>>       >From 172.28.200.168 icmp_seq=2 Destination Host Unreachable
>>>       >From 172.28.200.168 icmp_seq=3 Destination Host Unreachable
>>>      ^C
>>>      --- 172.28.200.159 ping statistics ---
>>>      5 packets transmitted, 0 received, +4 errors, 100% packet loss,
>>> time 4024ms
>>>      , pipe 3
>>>      cl2_lb1:~ # ping 172.16.0.5
>>>      PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.
>>>       >From 172.16.0.3 <http://172.16.0.3>: icmp_seq=1 Destination Host
>>> Unreachable
>>>
>>>       >From 172.16.0.3 icmp_seq=1 Destination Host Unreachable
>>>       >From 172.16.0.3 icmp_seq=2 Destination Host Unreachable
>>>       >From 172.16.0.3 icmp_seq=3 Destination Host Unreachable
>>>       >From 172.16.0.3 icmp_seq=5 Destination Host Unreachable
>>>       >From 172.16.0.3 icmp_seq=6 Destination Host Unreachable
>>>       >From 172.16.0.3 icmp_seq=7 Destination Host Unreachable
>>>      ^C
>>>      --- 172.16.0.5 ping statistics ---
>>>      8 packets transmitted, 0 received, +7 errors, 100% packet loss,
>>> time 7015ms
>>>      , pipe 3
>>>      cl2_lb1:~ #
>>>
>>>      Any ideas please, or it it a case of recovering the original master
>>> manually
>>>      before starting Corosync etc?
>>>      All logs are attached
>>>      Regards
>>>      On Mon, Mar 16, 2015 at 11:01 AM, Wynand Jansen van Vuuren
>>>      <esawyja at gmail.com> wrote:
>>>
>>>          Thanks for the advice, I have a demo on this now, so I don't
>>> want to
>>>          test this now, I will do so tomorrow and forwards the logs,
>>> many thanks!!
>>>          On Mon, Mar 16, 2015 at 10:54 AM, NAKAHIRA Kazutomo
>>>          <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>
>>>              Hi,
>>>
>>>              > do you suggest that I take it out? or should I look at
>>> the problem where
>>>              > cl2_lb1 is not being promoted?
>>>
>>>              You should look at the problem where cl2_lb1 is not being
>>> promoted.
>>>              And I look it if you send me a ha-log and PostgreSQL's log.
>>>
>>>              Best regards,
>>>              Kazutomo NAKAHIRA
>>>
>>>
>>>              On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:
>>>
>>>                  Hi Nakahira,
>>>                  Thanks so much for the info, this setting was as the
>>> wiki page
>>>                  suggested,
>>>                  do you suggest that I take it out? or should I look at
>>> the
>>>                  problem where
>>>                  cl2_lb1 is not being promoted?
>>>                  Regards
>>>
>>>                  On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <
>>>                  nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>
>>>                      Hi,
>>>
>>>                          Notice there is no VIPs, looks like the VIPs
>>> depends on
>>>                          some other
>>>
>>>                      resource
>>>
>>>                          to start 1st?
>>>
>>>
>>>                      The following constraints means that "master-group"
>>> can not
>>>                      start
>>>                      without master of msPostgresql resource.
>>>
>>>                      colocation rsc_colocation-1 inf: master-group
>>>                      msPostgresql:Master
>>>
>>>                      After you power off cl1_lb1, msPostgresql on the
>>> cl2_lb1 is
>>>                      not promoted
>>>                      and master is not exist in your cluster.
>>>
>>>                      It means that "master-group" can not run anyware.
>>>
>>>                      Best regards,
>>>                      Kazutomo NAKAHIRA
>>>
>>>
>>>                      On 2015/03/16 16:48, Wynand Jansen van Vuuren wrote:
>>>
>>>                          Hi
>>>                          When I start out cl1_lb1 (Cluster 1 load
>>> balancer 1) is
>>>                          the master as
>>>                          below
>>>                          cl1_lb1:~ # crm_mon -1 -Af
>>>                          Last updated: Mon Mar 16 09:44:44 2015
>>>                          Last change: Mon Mar 16 08:06:26 2015 by root
>>> via
>>>                          crm_attribute on cl1_lb1
>>>                          Stack: classic openais (with plugin)
>>>                          Current DC: cl2_lb1 - partition with quorum
>>>                          Version: 1.1.9-2db99f1
>>>                          2 Nodes configured, 2 expected votes
>>>                          6 Resources configured.
>>>
>>>
>>>                          Online: [ cl1_lb1 cl2_lb1 ]
>>>
>>>                              Resource Group: master-group
>>>                                  vip-master    (ocf::heartbeat:IPaddr2):
>>>                          Started cl1_lb1
>>>                                  vip-rep    (ocf::heartbeat:IPaddr2):
>>> Started
>>>                          cl1_lb1
>>>                                  CBC_instance    (ocf::heartbeat:cbc):
>>>   Started
>>>                          cl1_lb1
>>>                                  failover_MailTo
>>> (ocf::heartbeat:MailTo):
>>>                          Started cl1_lb1
>>>                              Master/Slave Set: msPostgresql [pgsql]
>>>                                  Masters: [ cl1_lb1 ]
>>>                                  Slaves: [ cl2_lb1 ]
>>>
>>>                          Node Attributes:
>>>                          * Node cl1_lb1:
>>>                                 + master-pgsql                        :
>>> 1000
>>>                                 + pgsql-data-status                   :
>>> LATEST
>>>                                 + pgsql-master-baseline               :
>>>                          00000008B90061F0
>>>                                 + pgsql-status                        :
>>> PRI
>>>                          * Node cl2_lb1:
>>>                                 + master-pgsql                        :
>>> 100
>>>                                 + pgsql-data-status                   :
>>>                          STREAMING|SYNC
>>>                                 + pgsql-status                        :
>>> HS:sync
>>>
>>>                          Migration summary:
>>>                          * Node cl2_lb1:
>>>                          * Node cl1_lb1:
>>>                          cl1_lb1:~ #
>>>
>>>                          If I then do a power off on cl1_lb1 (master),
>>> Postgres
>>>                          moves to cl2_lb1
>>>                          (Cluster 2 load balancer 1), but the VIP-MASTER
>>> and
>>>                          VIP-REP is not
>>>                          pingable
>>>                          from the NEW master (cl2_lb1), it stays line
>>> this below
>>>                          cl2_lb1:~ # crm_mon -1 -Af
>>>                          Last updated: Mon Mar 16 07:32:07 2015
>>>                          Last change: Mon Mar 16 07:28:53 2015 by root
>>> via
>>>                          crm_attribute on cl1_lb1
>>>                          Stack: classic openais (with plugin)
>>>                          Current DC: cl2_lb1 - partition WITHOUT quorum
>>>                          Version: 1.1.9-2db99f1
>>>                          2 Nodes configured, 2 expected votes
>>>                          6 Resources configured.
>>>
>>>
>>>                          Online: [ cl2_lb1 ]
>>>                          OFFLINE: [ cl1_lb1 ]
>>>
>>>                              Master/Slave Set: msPostgresql [pgsql]
>>>                                  Slaves: [ cl2_lb1 ]
>>>                                  Stopped: [ pgsql:1 ]
>>>
>>>                          Node Attributes:
>>>                          * Node cl2_lb1:
>>>                                 + master-pgsql                        :
>>> -INFINITY
>>>                                 + pgsql-data-status                   :
>>> DISCONNECT
>>>                                 + pgsql-status                        :
>>> HS:alone
>>>
>>>                          Migration summary:
>>>                          * Node cl2_lb1:
>>>                          cl2_lb1:~ #
>>>
>>>                          Notice there is no VIPs, looks like the VIPs
>>> depends on
>>>                          some other
>>>                          resource
>>>                          to start 1st?
>>>                          Thanks for the reply!
>>>
>>>
>>>                          On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA
>>> Kazutomo <
>>>                          nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>
>>>                             Hi,
>>>
>>>
>>>                                 fine, cl2_lb1 takes over and acts as a
>>> slave, but
>>>                              the VIPs does not come
>>>
>>>
>>>                              cl2_lb1 acts as a slave? It is not a master?
>>>                              VIPs comes up with master msPostgresql
>>> resource.
>>>
>>>                              If promote action was failed in the
>>> cl2_lb1, then
>>>                              please send a ha-log and PostgreSQL's log.
>>>
>>>                              Best regards,
>>>                              Kazutomo NAKAHIRA
>>>
>>>
>>>                              On 2015/03/16 16:09, Wynand Jansen van
>>> Vuuren wrote:
>>>
>>>                                 Hi all,
>>>
>>>
>>>                                  I have 2 nodes, with 2 interfaces each,
>>> ETH0 is
>>>                                  used for an application,
>>>                                  CBC, that's writing to the Postgres DB
>>> on the
>>>                                  VIP-MASTER 172.28.200.159,
>>>                                  ETH1 is used for the Corosync
>>> configuration and
>>>                                  VIP-REP, everything
>>>                                  works,
>>>                                  but if the master currently on cl1_lb1
>>> has a
>>>                                  catastrophic failure, like
>>>                                  power down, the VIPs does not start on
>>> the
>>>                                  slave, the Postgres parts
>>>                                  works
>>>                                  fine, cl2_lb1 takes over and acts as a
>>> slave,
>>>                                  but the VIPs does not come
>>>                                  up. If I test it manually, IE kill the
>>>                                  application 3 times on the
>>>                                  master,
>>>                                  the switchover is smooth, same if I kill
>>>                                  Postgres on master, but when
>>>                                  there
>>>                                  is a power failure on the Master, the
>>> VIPs stay
>>>                                  down. If I then delete
>>>                                  the
>>>                                  attributes pgsql-data-status="LATEST"
>>> and attributes
>>>                                  pgsql-data-status="STREAMING|SYNC" on
>>> the slave
>>>                                  after power off on the
>>>                                  master and restart everything, then the
>>> VIPs
>>>                                  come up on the slave, any
>>>                                  ideas please?
>>>                                  I'm using this setup
>>>                                  http://clusterlabs.org/wiki/
>>> PgSQL_Replicated_Cluster
>>>
>>>                                  With this configuration below
>>>                                  node cl1_lb1 \
>>>                                              attributes
>>> pgsql-data-status="LATEST"
>>>                                  node cl2_lb1 \
>>>                                              attributes
>>>                                  pgsql-data-status="STREAMING|SYNC"
>>>                                  primitive CBC_instance
>>> ocf:heartbeat:cbc \
>>>                                              op monitor interval="60s"
>>>                                  timeout="60s" on-fail="restart" \
>>>                                              op start interval="0s"
>>> timeout="60s"
>>>                                  on-fail="restart" \
>>>                                              meta target-role="Started"
>>>                                  migration-threshold="3"
>>>                                  failure-timeout="60s"
>>>                                  primitive failover_MailTo
>>> ocf:heartbeat:MailTo \
>>>                                              params email="
>>> wynandj at rorotika.com"
>>>                                  subject="Cluster Status
>>>                                  change
>>>                                  - " \
>>>                                              op monitor interval="10"
>>>                                  timeout="10" dept="0"
>>>                                  primitive pgsql ocf:heartbeat:pgsql \
>>>                                              params
>>>                                  pgctl="/opt/app/PostgreSQL/9.
>>> 3/bin/pg_ctl"
>>>                                  psql="/opt/app/PostgreSQL/9.3/bin/psql"
>>>                                  config="/opt/app/pgdata/9.3/
>>> postgresql.conf"
>>>                                  pgdba="postgres"
>>>                                  pgdata="/opt/app/pgdata/9.3/"
>>> start_opt="-p
>>>                                  5432" rep_mode="sync"
>>>                                  node_list="cl1_lb1 cl2_lb1"
>>> restore_command="cp
>>>                                  /pgtablespace/archive/%f
>>>                                  %p" primary_conninfo_opt="
>>> keepalives_idle=60
>>>                                  keepalives_interval=5
>>>                                  keepalives_count=5"
>>> master_ip="172.16.0.5"
>>>                                  restart_on_promote="false"
>>>                                  logfile="/var/log/OCF.log" \
>>>                                              op start interval="0s"
>>> timeout="60s"
>>>                                  on-fail="restart" \
>>>                                              op monitor interval="4s"
>>>                                  timeout="60s" on-fail="restart" \
>>>                                              op monitor interval="3s"
>>>                                  role="Master" timeout="60s"
>>>                                  on-fail="restart" \
>>>                                              op promote interval="0s"
>>>                                  timeout="60s" on-fail="restart" \
>>>                                              op demote interval="0s"
>>>                                  timeout="60s" on-fail="stop" \
>>>                                              op stop interval="0s"
>>> timeout="60s"
>>>                                  on-fail="block" \
>>>                                              op notify interval="0s"
>>> timeout="60s"
>>>                                  primitive vip-master
>>> ocf:heartbeat:IPaddr2 \
>>>                                              params ip="172.28.200.159"
>>>                                  nic="eth0" iflabel="CBC_VIP"
>>>                                  cidr_netmask="24" \
>>>                                              op start interval="0s"
>>> timeout="60s"
>>>                                  on-fail="restart" \
>>>                                              op monitor interval="10s"
>>>                                  timeout="60s" on-fail="restart" \
>>>                                              op stop interval="0s"
>>> timeout="60s"
>>>                                  on-fail="block" \
>>>                                              meta target-role="Started"
>>>                                  primitive vip-rep ocf:heartbeat:IPaddr2
>>> \
>>>                                              params ip="172.16.0.5"
>>> nic="eth1"
>>>                                  iflabel="REP_VIP"
>>>                                  cidr_netmask="24" \
>>>                                              meta migration-threshold="0"
>>>                                  target-role="Started" \
>>>                                              op start interval="0s"
>>> timeout="60s"
>>>                                  on-fail="stop" \
>>>                                              op monitor interval="10s"
>>>                                  timeout="60s" on-fail="restart" \
>>>                                              op stop interval="0s"
>>> timeout="60s"
>>>                                  on-fail="restart"
>>>                                  group master-group vip-master vip-rep
>>>                                  CBC_instance failover_MailTo
>>>                                  ms msPostgresql pgsql \
>>>                                              meta master-max="1"
>>>                                  master-node-max="1" clone-max="2"
>>>                                  clone-node-max="1" notify="true"
>>>                                  colocation rsc_colocation-1 inf:
>>> master-group
>>>                                  msPostgresql:Master
>>>                                  order rsc_order-1 0:
>>> msPostgresql:promote
>>>                                  master-group:start
>>>                                  symmetrical=false
>>>                                  order rsc_order-2 0: msPostgresql:demote
>>>                                  master-group:stop
>>>                                  symmetrical=false
>>>                                  property $id="cib-bootstrap-options" \
>>>                                              dc-version="1.1.9-2db99f1" \
>>>                                              cluster-infrastructure="
>>> classic
>>>                                  openais (with plugin)" \
>>>                                              expected-quorum-votes="2" \
>>>                                              no-quorum-policy="ignore" \
>>>                                              stonith-enabled="false" \
>>>                                              cluster-recheck-interval="1min"
>>> \
>>>                                              crmd-transition-delay="0s" \
>>>
>>>  last-lrm-refresh="1426485983"
>>>                                              rsc_defaults
>>> $id="rsc-options" \
>>>
>>>  resource-stickiness="INFINITY" \
>>>                                              migration-threshold="1"
>>>                                  #vim:set syntax=pcmk
>>>
>>>                                  Any ideas please, I'm lost......
>>>
>>>
>>>
>>>                                  ______________________________
>>> _________________
>>>                                  Users mailing list:
>>> Users at clusterlabs.org
>>>                                  http://clusterlabs.org/
>>> mailman/listinfo/users
>>>
>>>                                  Project Home:
>>> http://www.clusterlabs.org
>>>                                  Getting started:
>>> http://www.clusterlabs.org/
>>>                                  doc/Cluster_from_Scratch.pdf
>>>                                  Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>                              ______________________________
>>> _________________
>>>                              Users mailing list: Users at clusterlabs.org
>>>                              http://clusterlabs.org/
>>> mailman/listinfo/users
>>>
>>>                              Project Home: http://www.clusterlabs.org
>>>                              Getting started:
>>>                              http://www.clusterlabs.org/
>>> doc/Cluster_from_Scratch.pdf
>>>                              Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>                          _______________________________________________
>>>                          Users mailing list: Users at clusterlabs.org
>>>                          http://clusterlabs.org/mailman/listinfo/users
>>>
>>>                          Project Home: http://www.clusterlabs.org
>>>                          Getting started:
>>>                          http://www.clusterlabs.org/
>>> doc/Cluster_from_Scratch.pdf
>>>                          Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>                      --
>>>                      NTT オープンソースソフトウェアセンタ
>>>                      中平 和友
>>>                      TEL: 03-5860-5135 FAX: 03-5463-6490
>>>                      Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>>
>>>
>>>
>>>                      _______________________________________________
>>>                      Users mailing list: Users at clusterlabs.org
>>>                      http://clusterlabs.org/mailman/listinfo/users
>>>
>>>                      Project Home: http://www.clusterlabs.org
>>>                      Getting started:
>>>                      http://www.clusterlabs.org/
>>> doc/Cluster_from_Scratch.pdf
>>>                      Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>
>>>                  _______________________________________________
>>>                  Users mailing list: Users at clusterlabs.org
>>>                  http://clusterlabs.org/mailman/listinfo/users
>>>
>>>                  Project Home: http://www.clusterlabs.org
>>>                  Getting started:
>>>                  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>                  Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>              --
>>>              NTT オープンソースソフトウェアセンタ
>>>              中平 和友
>>>              TEL: 03-5860-5135 FAX: 03-5463-6490
>>>              Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>>
>>>
>>>              _______________________________________________
>>>              Users mailing list: Users at clusterlabs.org
>>>              http://clusterlabs.org/mailman/listinfo/users
>>>
>>>              Project Home: http://www.clusterlabs.org
>>>              Getting started: http://www.clusterlabs.org/
>>> doc/Cluster_from_Scratch.pdf
>>>              Bugs: http://bugs.clusterlabs.org
>>>
>>> _______________________________________________ Users mailing list:
>>> Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
>>> Project
>>> Home: http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>> http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150319/d48a1237/attachment-0002.html>