[ClusterLabs] 4 node, 2 cluster setup with separated applications

Thu Mar 19 09:47:18 UTC 2015

On Thu, Mar 19, 2015 at 12:31 PM, Wynand Jansen van Vuuren
<esawyja at gmail.com> wrote:
> Hi all,
> I have a different question please, let say I have the following
> 4 - nodes, 2 clusters, 2 nodes per cluster, so I have in the west of the
> country Cluster 1 with cl1_lb1 and cl1_lb2 as the nodes, in the east of the
> country I have Cluster 2 with cl2_lb1 and cl2_lb2 as the nodes
>

According to output you provided you have single cluster consisting of
4 nodes, not two clusters of 2 nodes each.

> I have 3 different applications, Postgres, App1 and App2, App1 uses a VIP to
> write to Postgres, App2 uses Apache2
>
> Can I do the following
> cl1_lb1, runs Postgres streaming with App1 VIP in Master/Slave configuration
> to cl2_lb1
>
> cl1_lb1, cl1_lb2, cl2_lb1 and cl2_lb2 all runs App2 and the VIP round robin
> for the Apache page
>
> So my question is actually this, in this configuration, in the corosync.conf
> file, what would the expected_votes setting be, 2 or 4? and can you separate
> the resources per node? I thought the node_list, rep_mode="sync"
> node_list="cl1_lb1 cl2_lb1" in the pgsql primitive would isolate the pgsql
> to run on cl1_lb1 and cl2_lb1 only, but it does not seem to be the case, as
> soon as I add the other nodes to the corosync configuration, I get this
> below
>
> cl1_lb1:/opt/temp # crm_mon -1 -Af
> Last updated: Thu Mar 19 11:29:16 2015
> Last change: Thu Mar 19 11:10:17 2015 by hacluster via crmd on cl1_lb1
> Stack: classic openais (with plugin)
> Current DC: cl1_lb1 - partition with quorum
> Version: 1.1.9-2db99f1
> 4 Nodes configured, 4 expected votes
> 6 Resources configured.
>
>
> Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]
>
>
> Node Attributes:
> * Node cl1_lb1:
>     + master-pgsql                        : -INFINITY
>     + pgsql-data-status                   : LATEST
>     + pgsql-status                        : STOP
> * Node cl1_lb2:
>     + pgsql-status                        : UNKNOWN
> * Node cl2_lb1:
>     + master-pgsql                        : -INFINITY
>     + pgsql-data-status                   : LATEST
>     + pgsql-status                        : STOP
> * Node cl2_lb2:
>     + pgsql-status                        : UNKNOWN
>
> Migration summary:
> * Node cl2_lb1:
>    pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar
> 19 11:10:18 2015'
> * Node cl1_lb1:
>    pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar
> 19 11:10:18 2015'
> * Node cl2_lb2:
> * Node cl1_lb2:
>
> Failed actions:
>     pgsql_start_0 (node=cl2_lb1, call=561, rc=1, status=complete): unknown
> error
>     pgsql_start_0 (node=cl1_lb1, call=292, rc=1, status=complete): unknown
> error
>     pgsql_start_0 (node=cl2_lb2, call=115, rc=5, status=complete): not
> installed
>     pgsql_start_0 (node=cl1_lb2, call=73, rc=5, status=complete): not
> installed
> cl1_lb1:/opt/temp #
>
> Any suggestions on how I can achieve this please ?
>

But it does exactly what you want - posgres won't be started on nodes
cl1_lb2, cl2_lb2. If you want to get rid of probing errors, you need
to either install postgres on all nodes (so agents do not fail) or set
symmetric-cluster=false.

> Regards
>
>
>
> On Wed, Mar 18, 2015 at 7:32 AM, Wynand Jansen van Vuuren
> <esawyja at gmail.com> wrote:
>>
>> Hi
>> Yes the problem was solved, it was the Linux Kernel that started Postgres
>> when the failed server came up again, I disabled the automatic start with
>> chkconfig and that solved the problem, I will take out 172.16.0.5 from the
>> conf file,
>> THANKS SO MUCH for all the help, I will do a blog post on how this is done
>> on SLES 11 SP3 and Postgres 9.3 and will post the URL for the group, in case
>> it will help someone out there, thanks again for all the help!
>> Regards
>>
>> On Wed, Mar 18, 2015 at 3:58 AM, NAKAHIRA Kazutomo
>> <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>
>>> Hi,
>>>
>>> As Brestan pointed out, old master can not come up as a slave is expected
>>> feature.
>>>
>>> BTW, this action is different from the original problem.
>>> It seems from logs, promote action succeeded in the cl2_lb1 after power
>>> off cl1_lb1.
>>> Was the original problem resolved?
>>>
>>> And cl2_lb1's postgresql.conf has the following problem.
>>>
>>> 2015-03-17 07:34:28 SAST DETAIL:  The failed archive command was: cp
>>> pg_xlog/0000001D00000008000000C2
>>> 172.16.0.5:/pgtablespace/archive/0000001D00000008000000C2
>>>
>>> "172.16.0.5" must be eliminated from the archive_command directive in the
>>> postgresql.conf.
>>>
>>> Best regards,
>>> Kazutomo NAKAHIRA
>>>
>>> On 2015/03/18 5:00, Rainer Brestan wrote:
>>>>
>>>> Yes, thats the expected behaviour.
>>>> Takatoshi Matsuo describes in his papers, why a former master cant come
>>>> up as
>>>> slave without possible data corruption.
>>>> And you do not get an indication from Postgres that the data on disk is
>>>> corrupted.
>>>> Therefore, he created the lock file mechanism to prevent a former master
>>>> to
>>>> start up.
>>>> Making the base backup from Master discards any possibly wrong data from
>>>> the
>>>> slave and the removed lock files indicates this for the resource agent.
>>>> To shorten the discussion about "how this can be automated within the
>>>> resource
>>>> agent", there is no clean way of handling this with very large
>>>> databases, for
>>>> which this can take hours.
>>>> And what you should do is making the base backup in a temporary
>>>> directory and
>>>> then renaming this to the name Postgres instance requires after base
>>>> backup
>>>> finish successful (yes, this requires twice of harddisk space).
>>>> Otherwise you
>>>> might loose everything, when your master brakes during base backup.
>>>> Rainer
>>>> *Gesendet:* Dienstag, 17. März 2015 um 12:16 Uhr
>>>> *Von:* "Wynand Jansen van Vuuren" <esawyja at gmail.com>
>>>> *An:* "Cluster Labs - All topics related to open-source clustering
>>>> welcomed"
>>>> <users at clusterlabs.org>
>>>> *Betreff:* Re: [ClusterLabs] Postgres streaming VIP-REP not coming up on
>>>> slave
>>>>
>>>> Hi
>>>> Ok I found this particular problem, when the failed node comes up again,
>>>> the
>>>> kernel start Postgres, I have disabled this and now the VIPs and
>>>> Postgres remain
>>>> on the new MASTER, but the failed node does not come up as a slave, IE
>>>> there is
>>>> no sync between the new master and slave, is this the expected behavior?
>>>> The
>>>> only way I can get it back into slave mode is to follow the procedure in
>>>> the wiki
>>>>
>>>> # su - postgres
>>>> $ rm -rf /var/lib/pgsql/data/
>>>> $ pg_basebackup -h 192.168.2.3 -U postgres -D /var/lib/pgsql/data -X
>>>> stream -P
>>>> $ rm /var/lib/pgsql/tmp/PGSQL.lock
>>>> $ exit
>>>> # pcs resource cleanup msPostgresql
>>>>
>>>> Looking forward to your reply please
>>>> Regards
>>>> On Tue, Mar 17, 2015 at 7:55 AM, Wynand Jansen van Vuuren
>>>> <esawyja at gmail.com>
>>>> wrote:
>>>>
>>>>      Hi Nakahira,
>>>>      I finally got around testing this, below is the initial state
>>>>
>>>>      cl1_lb1:~ # crm_mon -1 -Af
>>>>      Last updated: Tue Mar 17 07:31:58 2015
>>>>      Last change: Tue Mar 17 07:31:12 2015 by root via crm_attribute on
>>>> cl1_lb1
>>>>      Stack: classic openais (with plugin)
>>>>      Current DC: cl1_lb1 - partition with quorum
>>>>      Version: 1.1.9-2db99f1
>>>>      2 Nodes configured, 2 expected votes
>>>>      6 Resources configured.
>>>>
>>>>
>>>>      Online: [ cl1_lb1 cl2_lb1 ]
>>>>
>>>>        Resource Group: master-group
>>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl1_lb1
>>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl1_lb1
>>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started
>>>> cl1_lb1
>>>>        Master/Slave Set: msPostgresql [pgsql]
>>>>            Masters: [ cl1_lb1 ]
>>>>            Slaves: [ cl2_lb1 ]
>>>>
>>>>      Node Attributes:
>>>>      * Node cl1_lb1:
>>>>           + master-pgsql                        : 1000
>>>>           + pgsql-data-status                   : LATEST
>>>>           + pgsql-master-baseline               : 00000008BE000000
>>>>           + pgsql-status                        : PRI
>>>>      * Node cl2_lb1:
>>>>           + master-pgsql                        : 100
>>>>           + pgsql-data-status                   : STREAMING|SYNC
>>>>           + pgsql-status                        : HS:sync
>>>>
>>>>      Migration summary:
>>>>      * Node cl2_lb1:
>>>>      * Node cl1_lb1:
>>>>      cl1_lb1:~ #
>>>>      ###### -  I then did a init 0 on the master node, cl1_lb1
>>>>
>>>>      cl1_lb1:~ # init 0
>>>>      cl1_lb1:~ #
>>>>      Connection closed by foreign host.
>>>>
>>>>      Disconnected from remote host(cl1_lb1) at 07:36:18.
>>>>
>>>>      Type `help' to learn how to use Xshell prompt.
>>>>      [c:\~]$
>>>>      ###### - This was ok as the slave took over, became master
>>>>
>>>>      cl2_lb1:~ # crm_mon -1 -Af
>>>>      Last updated: Tue Mar 17 07:35:04 2015
>>>>      Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on
>>>> cl2_lb1
>>>>      Stack: classic openais (with plugin)
>>>>      Current DC: cl2_lb1 - partition WITHOUT quorum
>>>>      Version: 1.1.9-2db99f1
>>>>      2 Nodes configured, 2 expected votes
>>>>      6 Resources configured.
>>>>
>>>>
>>>>      Online: [ cl2_lb1 ]
>>>>      OFFLINE: [ cl1_lb1 ]
>>>>
>>>>        Resource Group: master-group
>>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl2_lb1
>>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started
>>>> cl2_lb1
>>>>        Master/Slave Set: msPostgresql [pgsql]
>>>>            Masters: [ cl2_lb1 ]
>>>>            Stopped: [ pgsql:1 ]
>>>>
>>>>      Node Attributes:
>>>>      * Node cl2_lb1:
>>>>           + master-pgsql                        : 1000
>>>>           + pgsql-data-status                   : LATEST
>>>>           + pgsql-master-baseline               : 00000008C2000090
>>>>           + pgsql-status                        : PRI
>>>>
>>>>      Migration summary:
>>>>      * Node cl2_lb1:
>>>>      cl2_lb1:~ #
>>>>      And the logs from Postgres and Corosync are attached
>>>>      ###### - I then restarted the original Master cl1_lb1 and started
>>>> Corosync
>>>>      manually
>>>>      Once the original Master cl1_lb1 was up and Corosync running, the
>>>> status
>>>>      below happened, notice no VIPs and Postgres
>>>>      ###### - Still working below
>>>>
>>>>      cl2_lb1:~ # crm_mon -1 -Af
>>>>      Last updated: Tue Mar 17 07:36:55 2015
>>>>      Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on
>>>> cl2_lb1
>>>>      Stack: classic openais (with plugin)
>>>>      Current DC: cl2_lb1 - partition WITHOUT quorum
>>>>      Version: 1.1.9-2db99f1
>>>>      2 Nodes configured, 2 expected votes
>>>>      6 Resources configured.
>>>>
>>>>
>>>>      Online: [ cl2_lb1 ]
>>>>      OFFLINE: [ cl1_lb1 ]
>>>>
>>>>        Resource Group: master-group
>>>>            vip-master    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>>            vip-rep    (ocf::heartbeat:IPaddr2):    Started cl2_lb1
>>>>            CBC_instance    (ocf::heartbeat:cbc):    Started cl2_lb1
>>>>            failover_MailTo    (ocf::heartbeat:MailTo):    Started
>>>> cl2_lb1
>>>>        Master/Slave Set: msPostgresql [pgsql]
>>>>            Masters: [ cl2_lb1 ]
>>>>            Stopped: [ pgsql:1 ]
>>>>
>>>>      Node Attributes:
>>>>      * Node cl2_lb1:
>>>>           + master-pgsql                        : 1000
>>>>           + pgsql-data-status                   : LATEST
>>>>           + pgsql-master-baseline               : 00000008C2000090
>>>>           + pgsql-status                        : PRI
>>>>
>>>>      Migration summary:
>>>>      * Node cl2_lb1:
>>>>
>>>>      ###### - After original master is up and Corosync running on
>>>> cl1_lb1
>>>>
>>>>      cl2_lb1:~ # crm_mon -1 -Af
>>>>      Last updated: Tue Mar 17 07:37:47 2015
>>>>      Last change: Tue Mar 17 07:37:21 2015 by root via crm_attribute on
>>>> cl1_lb1
>>>>      Stack: classic openais (with plugin)
>>>>      Current DC: cl2_lb1 - partition with quorum
>>>>      Version: 1.1.9-2db99f1
>>>>      2 Nodes configured, 2 expected votes
>>>>      6 Resources configured.
>>>>
>>>>
>>>>      Online: [ cl1_lb1 cl2_lb1 ]
>>>>
>>>>
>>>>      Node Attributes:
>>>>      * Node cl1_lb1:
>>>>           + master-pgsql                        : -INFINITY
>>>>           + pgsql-data-status                   : LATEST
>>>>           + pgsql-status                        : STOP
>>>>      * Node cl2_lb1:
>>>>           + master-pgsql                        : -INFINITY
>>>>           + pgsql-data-status                   : DISCONNECT
>>>>           + pgsql-status                        : STOP
>>>>
>>>>      Migration summary:
>>>>      * Node cl2_lb1:
>>>>          pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
>>>> Mar 17
>>>>      07:37:26 2015'
>>>>      * Node cl1_lb1:
>>>>          pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
>>>> Mar 17
>>>>      07:37:26 2015'
>>>>
>>>>      Failed actions:
>>>>           pgsql_monitor_4000 (node=cl2_lb1, call=735, rc=7,
>>>> status=complete): not
>>>>      running
>>>>           pgsql_monitor_4000 (node=cl1_lb1, call=42, rc=7,
>>>> status=complete): not
>>>>      running
>>>>      cl2_lb1:~ #
>>>>      ##### - No VIPs up
>>>>
>>>>      cl2_lb1:~ # ping 172.28.200.159
>>>>      PING 172.28.200.159 (172.28.200.159) 56(84) bytes of data.
>>>>       >From 172.28.200.168 <http://172.28.200.168>: icmp_seq=1
>>>> Destination Host
>>>>      Unreachable
>>>>       >From 172.28.200.168 icmp_seq=1 Destination Host Unreachable
>>>>       >From 172.28.200.168 icmp_seq=2 Destination Host Unreachable
>>>>       >From 172.28.200.168 icmp_seq=3 Destination Host Unreachable
>>>>      ^C
>>>>      --- 172.28.200.159 ping statistics ---
>>>>      5 packets transmitted, 0 received, +4 errors, 100% packet loss,
>>>> time 4024ms
>>>>      , pipe 3
>>>>      cl2_lb1:~ # ping 172.16.0.5
>>>>      PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.
>>>>       >From 172.16.0.3 <http://172.16.0.3>: icmp_seq=1 Destination Host
>>>> Unreachable
>>>>
>>>>       >From 172.16.0.3 icmp_seq=1 Destination Host Unreachable
>>>>       >From 172.16.0.3 icmp_seq=2 Destination Host Unreachable
>>>>       >From 172.16.0.3 icmp_seq=3 Destination Host Unreachable
>>>>       >From 172.16.0.3 icmp_seq=5 Destination Host Unreachable
>>>>       >From 172.16.0.3 icmp_seq=6 Destination Host Unreachable
>>>>       >From 172.16.0.3 icmp_seq=7 Destination Host Unreachable
>>>>      ^C
>>>>      --- 172.16.0.5 ping statistics ---
>>>>      8 packets transmitted, 0 received, +7 errors, 100% packet loss,
>>>> time 7015ms
>>>>      , pipe 3
>>>>      cl2_lb1:~ #
>>>>
>>>>      Any ideas please, or it it a case of recovering the original master
>>>> manually
>>>>      before starting Corosync etc?
>>>>      All logs are attached
>>>>      Regards
>>>>      On Mon, Mar 16, 2015 at 11:01 AM, Wynand Jansen van Vuuren
>>>>      <esawyja at gmail.com> wrote:
>>>>
>>>>          Thanks for the advice, I have a demo on this now, so I don't
>>>> want to
>>>>          test this now, I will do so tomorrow and forwards the logs,
>>>> many thanks!!
>>>>          On Mon, Mar 16, 2015 at 10:54 AM, NAKAHIRA Kazutomo
>>>>          <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>>
>>>>              Hi,
>>>>
>>>>              > do you suggest that I take it out? or should I look at
>>>> the problem where
>>>>              > cl2_lb1 is not being promoted?
>>>>
>>>>              You should look at the problem where cl2_lb1 is not being
>>>> promoted.
>>>>              And I look it if you send me a ha-log and PostgreSQL's log.
>>>>
>>>>              Best regards,
>>>>              Kazutomo NAKAHIRA
>>>>
>>>>
>>>>              On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:
>>>>
>>>>                  Hi Nakahira,
>>>>                  Thanks so much for the info, this setting was as the
>>>> wiki page
>>>>                  suggested,
>>>>                  do you suggest that I take it out? or should I look at
>>>> the
>>>>                  problem where
>>>>                  cl2_lb1 is not being promoted?
>>>>                  Regards
>>>>
>>>>                  On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <
>>>>                  nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>>
>>>>                      Hi,
>>>>
>>>>                          Notice there is no VIPs, looks like the VIPs
>>>> depends on
>>>>                          some other
>>>>
>>>>                      resource
>>>>
>>>>                          to start 1st?
>>>>
>>>>
>>>>                      The following constraints means that "master-group"
>>>> can not
>>>>                      start
>>>>                      without master of msPostgresql resource.
>>>>
>>>>                      colocation rsc_colocation-1 inf: master-group
>>>>                      msPostgresql:Master
>>>>
>>>>                      After you power off cl1_lb1, msPostgresql on the
>>>> cl2_lb1 is
>>>>                      not promoted
>>>>                      and master is not exist in your cluster.
>>>>
>>>>                      It means that "master-group" can not run anyware.
>>>>
>>>>                      Best regards,
>>>>                      Kazutomo NAKAHIRA
>>>>
>>>>
>>>>                      On 2015/03/16 16:48, Wynand Jansen van Vuuren
>>>> wrote:
>>>>
>>>>                          Hi
>>>>                          When I start out cl1_lb1 (Cluster 1 load
>>>> balancer 1) is
>>>>                          the master as
>>>>                          below
>>>>                          cl1_lb1:~ # crm_mon -1 -Af
>>>>                          Last updated: Mon Mar 16 09:44:44 2015
>>>>                          Last change: Mon Mar 16 08:06:26 2015 by root
>>>> via
>>>>                          crm_attribute on cl1_lb1
>>>>                          Stack: classic openais (with plugin)
>>>>                          Current DC: cl2_lb1 - partition with quorum
>>>>                          Version: 1.1.9-2db99f1
>>>>                          2 Nodes configured, 2 expected votes
>>>>                          6 Resources configured.
>>>>
>>>>
>>>>                          Online: [ cl1_lb1 cl2_lb1 ]
>>>>
>>>>                              Resource Group: master-group
>>>>                                  vip-master    (ocf::heartbeat:IPaddr2):
>>>>                          Started cl1_lb1
>>>>                                  vip-rep    (ocf::heartbeat:IPaddr2):
>>>> Started
>>>>                          cl1_lb1
>>>>                                  CBC_instance    (ocf::heartbeat:cbc):
>>>> Started
>>>>                          cl1_lb1
>>>>                                  failover_MailTo
>>>> (ocf::heartbeat:MailTo):
>>>>                          Started cl1_lb1
>>>>                              Master/Slave Set: msPostgresql [pgsql]
>>>>                                  Masters: [ cl1_lb1 ]
>>>>                                  Slaves: [ cl2_lb1 ]
>>>>
>>>>                          Node Attributes:
>>>>                          * Node cl1_lb1:
>>>>                                 + master-pgsql                        :
>>>> 1000
>>>>                                 + pgsql-data-status                   :
>>>> LATEST
>>>>                                 + pgsql-master-baseline               :
>>>>                          00000008B90061F0
>>>>                                 + pgsql-status                        :
>>>> PRI
>>>>                          * Node cl2_lb1:
>>>>                                 + master-pgsql                        :
>>>> 100
>>>>                                 + pgsql-data-status                   :
>>>>                          STREAMING|SYNC
>>>>                                 + pgsql-status                        :
>>>> HS:sync
>>>>
>>>>                          Migration summary:
>>>>                          * Node cl2_lb1:
>>>>                          * Node cl1_lb1:
>>>>                          cl1_lb1:~ #
>>>>
>>>>                          If I then do a power off on cl1_lb1 (master),
>>>> Postgres
>>>>                          moves to cl2_lb1
>>>>                          (Cluster 2 load balancer 1), but the VIP-MASTER
>>>> and
>>>>                          VIP-REP is not
>>>>                          pingable
>>>>                          from the NEW master (cl2_lb1), it stays line
>>>> this below
>>>>                          cl2_lb1:~ # crm_mon -1 -Af
>>>>                          Last updated: Mon Mar 16 07:32:07 2015
>>>>                          Last change: Mon Mar 16 07:28:53 2015 by root
>>>> via
>>>>                          crm_attribute on cl1_lb1
>>>>                          Stack: classic openais (with plugin)
>>>>                          Current DC: cl2_lb1 - partition WITHOUT quorum
>>>>                          Version: 1.1.9-2db99f1
>>>>                          2 Nodes configured, 2 expected votes
>>>>                          6 Resources configured.
>>>>
>>>>
>>>>                          Online: [ cl2_lb1 ]
>>>>                          OFFLINE: [ cl1_lb1 ]
>>>>
>>>>                              Master/Slave Set: msPostgresql [pgsql]
>>>>                                  Slaves: [ cl2_lb1 ]
>>>>                                  Stopped: [ pgsql:1 ]
>>>>
>>>>                          Node Attributes:
>>>>                          * Node cl2_lb1:
>>>>                                 + master-pgsql                        :
>>>> -INFINITY
>>>>                                 + pgsql-data-status                   :
>>>> DISCONNECT
>>>>                                 + pgsql-status                        :
>>>> HS:alone
>>>>
>>>>                          Migration summary:
>>>>                          * Node cl2_lb1:
>>>>                          cl2_lb1:~ #
>>>>
>>>>                          Notice there is no VIPs, looks like the VIPs
>>>> depends on
>>>>                          some other
>>>>                          resource
>>>>                          to start 1st?
>>>>                          Thanks for the reply!
>>>>
>>>>
>>>>                          On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA
>>>> Kazutomo <
>>>>                          nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>>
>>>>                             Hi,
>>>>
>>>>
>>>>                                 fine, cl2_lb1 takes over and acts as a
>>>> slave, but
>>>>                              the VIPs does not come
>>>>
>>>>
>>>>                              cl2_lb1 acts as a slave? It is not a
>>>> master?
>>>>                              VIPs comes up with master msPostgresql
>>>> resource.
>>>>
>>>>                              If promote action was failed in the
>>>> cl2_lb1, then
>>>>                              please send a ha-log and PostgreSQL's log.
>>>>
>>>>                              Best regards,
>>>>                              Kazutomo NAKAHIRA
>>>>
>>>>
>>>>                              On 2015/03/16 16:09, Wynand Jansen van
>>>> Vuuren wrote:
>>>>
>>>>                                 Hi all,
>>>>
>>>>
>>>>                                  I have 2 nodes, with 2 interfaces each,
>>>> ETH0 is
>>>>                                  used for an application,
>>>>                                  CBC, that's writing to the Postgres DB
>>>> on the
>>>>                                  VIP-MASTER 172.28.200.159,
>>>>                                  ETH1 is used for the Corosync
>>>> configuration and
>>>>                                  VIP-REP, everything
>>>>                                  works,
>>>>                                  but if the master currently on cl1_lb1
>>>> has a
>>>>                                  catastrophic failure, like
>>>>                                  power down, the VIPs does not start on
>>>> the
>>>>                                  slave, the Postgres parts
>>>>                                  works
>>>>                                  fine, cl2_lb1 takes over and acts as a
>>>> slave,
>>>>                                  but the VIPs does not come
>>>>                                  up. If I test it manually, IE kill the
>>>>                                  application 3 times on the
>>>>                                  master,
>>>>                                  the switchover is smooth, same if I
>>>> kill
>>>>                                  Postgres on master, but when
>>>>                                  there
>>>>                                  is a power failure on the Master, the
>>>> VIPs stay
>>>>                                  down. If I then delete
>>>>                                  the
>>>>                                  attributes pgsql-data-status="LATEST"
>>>> and attributes
>>>>                                  pgsql-data-status="STREAMING|SYNC" on
>>>> the slave
>>>>                                  after power off on the
>>>>                                  master and restart everything, then the
>>>> VIPs
>>>>                                  come up on the slave, any
>>>>                                  ideas please?
>>>>                                  I'm using this setup
>>>>
>>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>>>>
>>>>                                  With this configuration below
>>>>                                  node cl1_lb1 \
>>>>                                              attributes
>>>> pgsql-data-status="LATEST"
>>>>                                  node cl2_lb1 \
>>>>                                              attributes
>>>>                                  pgsql-data-status="STREAMING|SYNC"
>>>>                                  primitive CBC_instance
>>>> ocf:heartbeat:cbc \
>>>>                                              op monitor interval="60s"
>>>>                                  timeout="60s" on-fail="restart" \
>>>>                                              op start interval="0s"
>>>> timeout="60s"
>>>>                                  on-fail="restart" \
>>>>                                              meta target-role="Started"
>>>>                                  migration-threshold="3"
>>>>                                  failure-timeout="60s"
>>>>                                  primitive failover_MailTo
>>>> ocf:heartbeat:MailTo \
>>>>                                              params
>>>> email="wynandj at rorotika.com"
>>>>                                  subject="Cluster Status
>>>>                                  change
>>>>                                  - " \
>>>>                                              op monitor interval="10"
>>>>                                  timeout="10" dept="0"
>>>>                                  primitive pgsql ocf:heartbeat:pgsql \
>>>>                                              params
>>>>
>>>> pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
>>>>                                  psql="/opt/app/PostgreSQL/9.3/bin/psql"
>>>>
>>>> config="/opt/app/pgdata/9.3/postgresql.conf"
>>>>                                  pgdba="postgres"
>>>>                                  pgdata="/opt/app/pgdata/9.3/"
>>>> start_opt="-p
>>>>                                  5432" rep_mode="sync"
>>>>                                  node_list="cl1_lb1 cl2_lb1"
>>>> restore_command="cp
>>>>                                  /pgtablespace/archive/%f
>>>>                                  %p"
>>>> primary_conninfo_opt="keepalives_idle=60
>>>>                                  keepalives_interval=5
>>>>                                  keepalives_count=5"
>>>> master_ip="172.16.0.5"
>>>>                                  restart_on_promote="false"
>>>>                                  logfile="/var/log/OCF.log" \
>>>>                                              op start interval="0s"
>>>> timeout="60s"
>>>>                                  on-fail="restart" \
>>>>                                              op monitor interval="4s"
>>>>                                  timeout="60s" on-fail="restart" \
>>>>                                              op monitor interval="3s"
>>>>                                  role="Master" timeout="60s"
>>>>                                  on-fail="restart" \
>>>>                                              op promote interval="0s"
>>>>                                  timeout="60s" on-fail="restart" \
>>>>                                              op demote interval="0s"
>>>>                                  timeout="60s" on-fail="stop" \
>>>>                                              op stop interval="0s"
>>>> timeout="60s"
>>>>                                  on-fail="block" \
>>>>                                              op notify interval="0s"
>>>> timeout="60s"
>>>>                                  primitive vip-master
>>>> ocf:heartbeat:IPaddr2 \
>>>>                                              params ip="172.28.200.159"
>>>>                                  nic="eth0" iflabel="CBC_VIP"
>>>>                                  cidr_netmask="24" \
>>>>                                              op start interval="0s"
>>>> timeout="60s"
>>>>                                  on-fail="restart" \
>>>>                                              op monitor interval="10s"
>>>>                                  timeout="60s" on-fail="restart" \
>>>>                                              op stop interval="0s"
>>>> timeout="60s"
>>>>                                  on-fail="block" \
>>>>                                              meta target-role="Started"
>>>>                                  primitive vip-rep ocf:heartbeat:IPaddr2
>>>> \
>>>>                                              params ip="172.16.0.5"
>>>> nic="eth1"
>>>>                                  iflabel="REP_VIP"
>>>>                                  cidr_netmask="24" \
>>>>                                              meta
>>>> migration-threshold="0"
>>>>                                  target-role="Started" \
>>>>                                              op start interval="0s"
>>>> timeout="60s"
>>>>                                  on-fail="stop" \
>>>>                                              op monitor interval="10s"
>>>>                                  timeout="60s" on-fail="restart" \
>>>>                                              op stop interval="0s"
>>>> timeout="60s"
>>>>                                  on-fail="restart"
>>>>                                  group master-group vip-master vip-rep
>>>>                                  CBC_instance failover_MailTo
>>>>                                  ms msPostgresql pgsql \
>>>>                                              meta master-max="1"
>>>>                                  master-node-max="1" clone-max="2"
>>>>                                  clone-node-max="1" notify="true"
>>>>                                  colocation rsc_colocation-1 inf:
>>>> master-group
>>>>                                  msPostgresql:Master
>>>>                                  order rsc_order-1 0:
>>>> msPostgresql:promote
>>>>                                  master-group:start
>>>>                                  symmetrical=false
>>>>                                  order rsc_order-2 0:
>>>> msPostgresql:demote
>>>>                                  master-group:stop
>>>>                                  symmetrical=false
>>>>                                  property $id="cib-bootstrap-options" \
>>>>                                              dc-version="1.1.9-2db99f1"
>>>> \
>>>>
>>>> cluster-infrastructure="classic
>>>>                                  openais (with plugin)" \
>>>>                                              expected-quorum-votes="2" \
>>>>                                              no-quorum-policy="ignore" \
>>>>                                              stonith-enabled="false" \
>>>>
>>>> cluster-recheck-interval="1min" \
>>>>                                              crmd-transition-delay="0s"
>>>> \
>>>>
>>>> last-lrm-refresh="1426485983"
>>>>                                              rsc_defaults
>>>> $id="rsc-options" \
>>>>
>>>> resource-stickiness="INFINITY" \
>>>>                                              migration-threshold="1"
>>>>                                  #vim:set syntax=pcmk
>>>>
>>>>                                  Any ideas please, I'm lost......
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>                                  Users mailing list:
>>>> Users at clusterlabs.org
>>>>
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>                                  Project Home:
>>>> http://www.clusterlabs.org
>>>>                                  Getting started:
>>>> http://www.clusterlabs.org/
>>>>                                  doc/Cluster_from_Scratch.pdf
>>>>                                  Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>                              Users mailing list: Users at clusterlabs.org
>>>>
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>                              Project Home: http://www.clusterlabs.org
>>>>                              Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>                              Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>                          _______________________________________________
>>>>                          Users mailing list: Users at clusterlabs.org
>>>>                          http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>                          Project Home: http://www.clusterlabs.org
>>>>                          Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>                          Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>                      --
>>>>                      NTT オープンソースソフトウェアセンタ
>>>>                      中平 和友
>>>>                      TEL: 03-5860-5135 FAX: 03-5463-6490
>>>>                      Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>>>
>>>>
>>>>
>>>>                      _______________________________________________
>>>>                      Users mailing list: Users at clusterlabs.org
>>>>                      http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>                      Project Home: http://www.clusterlabs.org
>>>>                      Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>                      Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>>                  _______________________________________________
>>>>                  Users mailing list: Users at clusterlabs.org
>>>>                  http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>                  Project Home: http://www.clusterlabs.org
>>>>                  Getting started:
>>>>                  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>                  Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>              --
>>>>              NTT オープンソースソフトウェアセンタ
>>>>              中平 和友
>>>>              TEL: 03-5860-5135 FAX: 03-5463-6490
>>>>              Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>>>
>>>>
>>>>              _______________________________________________
>>>>              Users mailing list: Users at clusterlabs.org
>>>>              http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>              Project Home: http://www.clusterlabs.org
>>>>              Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>              Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________ Users mailing list:
>>>> Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
>>>> Project
>>>> Home: http://www.clusterlabs.org Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>>> http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>