[ClusterLabs] 4 node, 2 cluster setup with separated applications
Andrei Borzenkov
arvidjaar at gmail.com
Thu Mar 19 09:47:18 UTC 2015
On Thu, Mar 19, 2015 at 12:31 PM, Wynand Jansen van Vuuren
<esawyja at gmail.com> wrote:
> Hi all,
> I have a different question please, let say I have the following
> 4 - nodes, 2 clusters, 2 nodes per cluster, so I have in the west of the
> country Cluster 1 with cl1_lb1 and cl1_lb2 as the nodes, in the east of the
> country I have Cluster 2 with cl2_lb1 and cl2_lb2 as the nodes
>
According to output you provided you have single cluster consisting of
4 nodes, not two clusters of 2 nodes each.
> I have 3 different applications, Postgres, App1 and App2, App1 uses a VIP to
> write to Postgres, App2 uses Apache2
>
> Can I do the following
> cl1_lb1, runs Postgres streaming with App1 VIP in Master/Slave configuration
> to cl2_lb1
>
> cl1_lb1, cl1_lb2, cl2_lb1 and cl2_lb2 all runs App2 and the VIP round robin
> for the Apache page
>
> So my question is actually this, in this configuration, in the corosync.conf
> file, what would the expected_votes setting be, 2 or 4? and can you separate
> the resources per node? I thought the node_list, rep_mode="sync"
> node_list="cl1_lb1 cl2_lb1" in the pgsql primitive would isolate the pgsql
> to run on cl1_lb1 and cl2_lb1 only, but it does not seem to be the case, as
> soon as I add the other nodes to the corosync configuration, I get this
> below
>
> cl1_lb1:/opt/temp # crm_mon -1 -Af
> Last updated: Thu Mar 19 11:29:16 2015
> Last change: Thu Mar 19 11:10:17 2015 by hacluster via crmd on cl1_lb1
> Stack: classic openais (with plugin)
> Current DC: cl1_lb1 - partition with quorum
> Version: 1.1.9-2db99f1
> 4 Nodes configured, 4 expected votes
> 6 Resources configured.
>
>
> Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]
>
>
> Node Attributes:
> * Node cl1_lb1:
> + master-pgsql : -INFINITY
> + pgsql-data-status : LATEST
> + pgsql-status : STOP
> * Node cl1_lb2:
> + pgsql-status : UNKNOWN
> * Node cl2_lb1:
> + master-pgsql : -INFINITY
> + pgsql-data-status : LATEST
> + pgsql-status : STOP
> * Node cl2_lb2:
> + pgsql-status : UNKNOWN
>
> Migration summary:
> * Node cl2_lb1:
> pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar
> 19 11:10:18 2015'
> * Node cl1_lb1:
> pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar
> 19 11:10:18 2015'
> * Node cl2_lb2:
> * Node cl1_lb2:
>
> Failed actions:
> pgsql_start_0 (node=cl2_lb1, call=561, rc=1, status=complete): unknown
> error
> pgsql_start_0 (node=cl1_lb1, call=292, rc=1, status=complete): unknown
> error
> pgsql_start_0 (node=cl2_lb2, call=115, rc=5, status=complete): not
> installed
> pgsql_start_0 (node=cl1_lb2, call=73, rc=5, status=complete): not
> installed
> cl1_lb1:/opt/temp #
>
> Any suggestions on how I can achieve this please ?
>
But it does exactly what you want - posgres won't be started on nodes
cl1_lb2, cl2_lb2. If you want to get rid of probing errors, you need
to either install postgres on all nodes (so agents do not fail) or set
symmetric-cluster=false.
> Regards
>
>
>
> On Wed, Mar 18, 2015 at 7:32 AM, Wynand Jansen van Vuuren
> <esawyja at gmail.com> wrote:
>>
>> Hi
>> Yes the problem was solved, it was the Linux Kernel that started Postgres
>> when the failed server came up again, I disabled the automatic start with
>> chkconfig and that solved the problem, I will take out 172.16.0.5 from the
>> conf file,
>> THANKS SO MUCH for all the help, I will do a blog post on how this is done
>> on SLES 11 SP3 and Postgres 9.3 and will post the URL for the group, in case
>> it will help someone out there, thanks again for all the help!
>> Regards
>>
>> On Wed, Mar 18, 2015 at 3:58 AM, NAKAHIRA Kazutomo
>> <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>
>>> Hi,
>>>
>>> As Brestan pointed out, old master can not come up as a slave is expected
>>> feature.
>>>
>>> BTW, this action is different from the original problem.
>>> It seems from logs, promote action succeeded in the cl2_lb1 after power
>>> off cl1_lb1.
>>> Was the original problem resolved?
>>>
>>> And cl2_lb1's postgresql.conf has the following problem.
>>>
>>> 2015-03-17 07:34:28 SAST DETAIL: The failed archive command was: cp
>>> pg_xlog/0000001D00000008000000C2
>>> 172.16.0.5:/pgtablespace/archive/0000001D00000008000000C2
>>>
>>> "172.16.0.5" must be eliminated from the archive_command directive in the
>>> postgresql.conf.
>>>
>>> Best regards,
>>> Kazutomo NAKAHIRA
>>>
>>> On 2015/03/18 5:00, Rainer Brestan wrote:
>>>>
>>>> Yes, thats the expected behaviour.
>>>> Takatoshi Matsuo describes in his papers, why a former master cant come
>>>> up as
>>>> slave without possible data corruption.
>>>> And you do not get an indication from Postgres that the data on disk is
>>>> corrupted.
>>>> Therefore, he created the lock file mechanism to prevent a former master
>>>> to
>>>> start up.
>>>> Making the base backup from Master discards any possibly wrong data from
>>>> the
>>>> slave and the removed lock files indicates this for the resource agent.
>>>> To shorten the discussion about "how this can be automated within the
>>>> resource
>>>> agent", there is no clean way of handling this with very large
>>>> databases, for
>>>> which this can take hours.
>>>> And what you should do is making the base backup in a temporary
>>>> directory and
>>>> then renaming this to the name Postgres instance requires after base
>>>> backup
>>>> finish successful (yes, this requires twice of harddisk space).
>>>> Otherwise you
>>>> might loose everything, when your master brakes during base backup.
>>>> Rainer
>>>> *Gesendet:* Dienstag, 17. März 2015 um 12:16 Uhr
>>>> *Von:* "Wynand Jansen van Vuuren" <esawyja at gmail.com>
>>>> *An:* "Cluster Labs - All topics related to open-source clustering
>>>> welcomed"
>>>> <users at clusterlabs.org>
>>>> *Betreff:* Re: [ClusterLabs] Postgres streaming VIP-REP not coming up on
>>>> slave
>>>>
>>>> Hi
>>>> Ok I found this particular problem, when the failed node comes up again,
>>>> the
>>>> kernel start Postgres, I have disabled this and now the VIPs and
>>>> Postgres remain
>>>> on the new MASTER, but the failed node does not come up as a slave, IE
>>>> there is
>>>> no sync between the new master and slave, is this the expected behavior?
>>>> The
>>>> only way I can get it back into slave mode is to follow the procedure in
>>>> the wiki
>>>>
>>>> # su - postgres
>>>> $ rm -rf /var/lib/pgsql/data/
>>>> $ pg_basebackup -h 192.168.2.3 -U postgres -D /var/lib/pgsql/data -X
>>>> stream -P
>>>> $ rm /var/lib/pgsql/tmp/PGSQL.lock
>>>> $ exit
>>>> # pcs resource cleanup msPostgresql
>>>>
>>>> Looking forward to your reply please
>>>> Regards
>>>> On Tue, Mar 17, 2015 at 7:55 AM, Wynand Jansen van Vuuren
>>>> <esawyja at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Nakahira,
>>>> I finally got around testing this, below is the initial state
>>>>
>>>> cl1_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Tue Mar 17 07:31:58 2015
>>>> Last change: Tue Mar 17 07:31:12 2015 by root via crm_attribute on
>>>> cl1_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl1_lb1 - partition with quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl1_lb1 cl2_lb1 ]
>>>>
>>>> Resource Group: master-group
>>>> vip-master (ocf::heartbeat:IPaddr2): Started cl1_lb1
>>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl1_lb1
>>>> CBC_instance (ocf::heartbeat:cbc): Started cl1_lb1
>>>> failover_MailTo (ocf::heartbeat:MailTo): Started
>>>> cl1_lb1
>>>> Master/Slave Set: msPostgresql [pgsql]
>>>> Masters: [ cl1_lb1 ]
>>>> Slaves: [ cl2_lb1 ]
>>>>
>>>> Node Attributes:
>>>> * Node cl1_lb1:
>>>> + master-pgsql : 1000
>>>> + pgsql-data-status : LATEST
>>>> + pgsql-master-baseline : 00000008BE000000
>>>> + pgsql-status : PRI
>>>> * Node cl2_lb1:
>>>> + master-pgsql : 100
>>>> + pgsql-data-status : STREAMING|SYNC
>>>> + pgsql-status : HS:sync
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>> * Node cl1_lb1:
>>>> cl1_lb1:~ #
>>>> ###### - I then did a init 0 on the master node, cl1_lb1
>>>>
>>>> cl1_lb1:~ # init 0
>>>> cl1_lb1:~ #
>>>> Connection closed by foreign host.
>>>>
>>>> Disconnected from remote host(cl1_lb1) at 07:36:18.
>>>>
>>>> Type `help' to learn how to use Xshell prompt.
>>>> [c:\~]$
>>>> ###### - This was ok as the slave took over, became master
>>>>
>>>> cl2_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Tue Mar 17 07:35:04 2015
>>>> Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on
>>>> cl2_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl2_lb1 - partition WITHOUT quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl2_lb1 ]
>>>> OFFLINE: [ cl1_lb1 ]
>>>>
>>>> Resource Group: master-group
>>>> vip-master (ocf::heartbeat:IPaddr2): Started cl2_lb1
>>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl2_lb1
>>>> CBC_instance (ocf::heartbeat:cbc): Started cl2_lb1
>>>> failover_MailTo (ocf::heartbeat:MailTo): Started
>>>> cl2_lb1
>>>> Master/Slave Set: msPostgresql [pgsql]
>>>> Masters: [ cl2_lb1 ]
>>>> Stopped: [ pgsql:1 ]
>>>>
>>>> Node Attributes:
>>>> * Node cl2_lb1:
>>>> + master-pgsql : 1000
>>>> + pgsql-data-status : LATEST
>>>> + pgsql-master-baseline : 00000008C2000090
>>>> + pgsql-status : PRI
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>> cl2_lb1:~ #
>>>> And the logs from Postgres and Corosync are attached
>>>> ###### - I then restarted the original Master cl1_lb1 and started
>>>> Corosync
>>>> manually
>>>> Once the original Master cl1_lb1 was up and Corosync running, the
>>>> status
>>>> below happened, notice no VIPs and Postgres
>>>> ###### - Still working below
>>>>
>>>> cl2_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Tue Mar 17 07:36:55 2015
>>>> Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on
>>>> cl2_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl2_lb1 - partition WITHOUT quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl2_lb1 ]
>>>> OFFLINE: [ cl1_lb1 ]
>>>>
>>>> Resource Group: master-group
>>>> vip-master (ocf::heartbeat:IPaddr2): Started cl2_lb1
>>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl2_lb1
>>>> CBC_instance (ocf::heartbeat:cbc): Started cl2_lb1
>>>> failover_MailTo (ocf::heartbeat:MailTo): Started
>>>> cl2_lb1
>>>> Master/Slave Set: msPostgresql [pgsql]
>>>> Masters: [ cl2_lb1 ]
>>>> Stopped: [ pgsql:1 ]
>>>>
>>>> Node Attributes:
>>>> * Node cl2_lb1:
>>>> + master-pgsql : 1000
>>>> + pgsql-data-status : LATEST
>>>> + pgsql-master-baseline : 00000008C2000090
>>>> + pgsql-status : PRI
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>>
>>>> ###### - After original master is up and Corosync running on
>>>> cl1_lb1
>>>>
>>>> cl2_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Tue Mar 17 07:37:47 2015
>>>> Last change: Tue Mar 17 07:37:21 2015 by root via crm_attribute on
>>>> cl1_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl2_lb1 - partition with quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl1_lb1 cl2_lb1 ]
>>>>
>>>>
>>>> Node Attributes:
>>>> * Node cl1_lb1:
>>>> + master-pgsql : -INFINITY
>>>> + pgsql-data-status : LATEST
>>>> + pgsql-status : STOP
>>>> * Node cl2_lb1:
>>>> + master-pgsql : -INFINITY
>>>> + pgsql-data-status : DISCONNECT
>>>> + pgsql-status : STOP
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>> pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
>>>> Mar 17
>>>> 07:37:26 2015'
>>>> * Node cl1_lb1:
>>>> pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue
>>>> Mar 17
>>>> 07:37:26 2015'
>>>>
>>>> Failed actions:
>>>> pgsql_monitor_4000 (node=cl2_lb1, call=735, rc=7,
>>>> status=complete): not
>>>> running
>>>> pgsql_monitor_4000 (node=cl1_lb1, call=42, rc=7,
>>>> status=complete): not
>>>> running
>>>> cl2_lb1:~ #
>>>> ##### - No VIPs up
>>>>
>>>> cl2_lb1:~ # ping 172.28.200.159
>>>> PING 172.28.200.159 (172.28.200.159) 56(84) bytes of data.
>>>> >From 172.28.200.168 <http://172.28.200.168>: icmp_seq=1
>>>> Destination Host
>>>> Unreachable
>>>> >From 172.28.200.168 icmp_seq=1 Destination Host Unreachable
>>>> >From 172.28.200.168 icmp_seq=2 Destination Host Unreachable
>>>> >From 172.28.200.168 icmp_seq=3 Destination Host Unreachable
>>>> ^C
>>>> --- 172.28.200.159 ping statistics ---
>>>> 5 packets transmitted, 0 received, +4 errors, 100% packet loss,
>>>> time 4024ms
>>>> , pipe 3
>>>> cl2_lb1:~ # ping 172.16.0.5
>>>> PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.
>>>> >From 172.16.0.3 <http://172.16.0.3>: icmp_seq=1 Destination Host
>>>> Unreachable
>>>>
>>>> >From 172.16.0.3 icmp_seq=1 Destination Host Unreachable
>>>> >From 172.16.0.3 icmp_seq=2 Destination Host Unreachable
>>>> >From 172.16.0.3 icmp_seq=3 Destination Host Unreachable
>>>> >From 172.16.0.3 icmp_seq=5 Destination Host Unreachable
>>>> >From 172.16.0.3 icmp_seq=6 Destination Host Unreachable
>>>> >From 172.16.0.3 icmp_seq=7 Destination Host Unreachable
>>>> ^C
>>>> --- 172.16.0.5 ping statistics ---
>>>> 8 packets transmitted, 0 received, +7 errors, 100% packet loss,
>>>> time 7015ms
>>>> , pipe 3
>>>> cl2_lb1:~ #
>>>>
>>>> Any ideas please, or it it a case of recovering the original master
>>>> manually
>>>> before starting Corosync etc?
>>>> All logs are attached
>>>> Regards
>>>> On Mon, Mar 16, 2015 at 11:01 AM, Wynand Jansen van Vuuren
>>>> <esawyja at gmail.com> wrote:
>>>>
>>>> Thanks for the advice, I have a demo on this now, so I don't
>>>> want to
>>>> test this now, I will do so tomorrow and forwards the logs,
>>>> many thanks!!
>>>> On Mon, Mar 16, 2015 at 10:54 AM, NAKAHIRA Kazutomo
>>>> <nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>>
>>>> Hi,
>>>>
>>>> > do you suggest that I take it out? or should I look at
>>>> the problem where
>>>> > cl2_lb1 is not being promoted?
>>>>
>>>> You should look at the problem where cl2_lb1 is not being
>>>> promoted.
>>>> And I look it if you send me a ha-log and PostgreSQL's log.
>>>>
>>>> Best regards,
>>>> Kazutomo NAKAHIRA
>>>>
>>>>
>>>> On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:
>>>>
>>>> Hi Nakahira,
>>>> Thanks so much for the info, this setting was as the
>>>> wiki page
>>>> suggested,
>>>> do you suggest that I take it out? or should I look at
>>>> the
>>>> problem where
>>>> cl2_lb1 is not being promoted?
>>>> Regards
>>>>
>>>> On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <
>>>> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Notice there is no VIPs, looks like the VIPs
>>>> depends on
>>>> some other
>>>>
>>>> resource
>>>>
>>>> to start 1st?
>>>>
>>>>
>>>> The following constraints means that "master-group"
>>>> can not
>>>> start
>>>> without master of msPostgresql resource.
>>>>
>>>> colocation rsc_colocation-1 inf: master-group
>>>> msPostgresql:Master
>>>>
>>>> After you power off cl1_lb1, msPostgresql on the
>>>> cl2_lb1 is
>>>> not promoted
>>>> and master is not exist in your cluster.
>>>>
>>>> It means that "master-group" can not run anyware.
>>>>
>>>> Best regards,
>>>> Kazutomo NAKAHIRA
>>>>
>>>>
>>>> On 2015/03/16 16:48, Wynand Jansen van Vuuren
>>>> wrote:
>>>>
>>>> Hi
>>>> When I start out cl1_lb1 (Cluster 1 load
>>>> balancer 1) is
>>>> the master as
>>>> below
>>>> cl1_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Mon Mar 16 09:44:44 2015
>>>> Last change: Mon Mar 16 08:06:26 2015 by root
>>>> via
>>>> crm_attribute on cl1_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl2_lb1 - partition with quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl1_lb1 cl2_lb1 ]
>>>>
>>>> Resource Group: master-group
>>>> vip-master (ocf::heartbeat:IPaddr2):
>>>> Started cl1_lb1
>>>> vip-rep (ocf::heartbeat:IPaddr2):
>>>> Started
>>>> cl1_lb1
>>>> CBC_instance (ocf::heartbeat:cbc):
>>>> Started
>>>> cl1_lb1
>>>> failover_MailTo
>>>> (ocf::heartbeat:MailTo):
>>>> Started cl1_lb1
>>>> Master/Slave Set: msPostgresql [pgsql]
>>>> Masters: [ cl1_lb1 ]
>>>> Slaves: [ cl2_lb1 ]
>>>>
>>>> Node Attributes:
>>>> * Node cl1_lb1:
>>>> + master-pgsql :
>>>> 1000
>>>> + pgsql-data-status :
>>>> LATEST
>>>> + pgsql-master-baseline :
>>>> 00000008B90061F0
>>>> + pgsql-status :
>>>> PRI
>>>> * Node cl2_lb1:
>>>> + master-pgsql :
>>>> 100
>>>> + pgsql-data-status :
>>>> STREAMING|SYNC
>>>> + pgsql-status :
>>>> HS:sync
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>> * Node cl1_lb1:
>>>> cl1_lb1:~ #
>>>>
>>>> If I then do a power off on cl1_lb1 (master),
>>>> Postgres
>>>> moves to cl2_lb1
>>>> (Cluster 2 load balancer 1), but the VIP-MASTER
>>>> and
>>>> VIP-REP is not
>>>> pingable
>>>> from the NEW master (cl2_lb1), it stays line
>>>> this below
>>>> cl2_lb1:~ # crm_mon -1 -Af
>>>> Last updated: Mon Mar 16 07:32:07 2015
>>>> Last change: Mon Mar 16 07:28:53 2015 by root
>>>> via
>>>> crm_attribute on cl1_lb1
>>>> Stack: classic openais (with plugin)
>>>> Current DC: cl2_lb1 - partition WITHOUT quorum
>>>> Version: 1.1.9-2db99f1
>>>> 2 Nodes configured, 2 expected votes
>>>> 6 Resources configured.
>>>>
>>>>
>>>> Online: [ cl2_lb1 ]
>>>> OFFLINE: [ cl1_lb1 ]
>>>>
>>>> Master/Slave Set: msPostgresql [pgsql]
>>>> Slaves: [ cl2_lb1 ]
>>>> Stopped: [ pgsql:1 ]
>>>>
>>>> Node Attributes:
>>>> * Node cl2_lb1:
>>>> + master-pgsql :
>>>> -INFINITY
>>>> + pgsql-data-status :
>>>> DISCONNECT
>>>> + pgsql-status :
>>>> HS:alone
>>>>
>>>> Migration summary:
>>>> * Node cl2_lb1:
>>>> cl2_lb1:~ #
>>>>
>>>> Notice there is no VIPs, looks like the VIPs
>>>> depends on
>>>> some other
>>>> resource
>>>> to start 1st?
>>>> Thanks for the reply!
>>>>
>>>>
>>>> On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA
>>>> Kazutomo <
>>>> nakahira_kazutomo_b1 at lab.ntt.co.jp> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> fine, cl2_lb1 takes over and acts as a
>>>> slave, but
>>>> the VIPs does not come
>>>>
>>>>
>>>> cl2_lb1 acts as a slave? It is not a
>>>> master?
>>>> VIPs comes up with master msPostgresql
>>>> resource.
>>>>
>>>> If promote action was failed in the
>>>> cl2_lb1, then
>>>> please send a ha-log and PostgreSQL's log.
>>>>
>>>> Best regards,
>>>> Kazutomo NAKAHIRA
>>>>
>>>>
>>>> On 2015/03/16 16:09, Wynand Jansen van
>>>> Vuuren wrote:
>>>>
>>>> Hi all,
>>>>
>>>>
>>>> I have 2 nodes, with 2 interfaces each,
>>>> ETH0 is
>>>> used for an application,
>>>> CBC, that's writing to the Postgres DB
>>>> on the
>>>> VIP-MASTER 172.28.200.159,
>>>> ETH1 is used for the Corosync
>>>> configuration and
>>>> VIP-REP, everything
>>>> works,
>>>> but if the master currently on cl1_lb1
>>>> has a
>>>> catastrophic failure, like
>>>> power down, the VIPs does not start on
>>>> the
>>>> slave, the Postgres parts
>>>> works
>>>> fine, cl2_lb1 takes over and acts as a
>>>> slave,
>>>> but the VIPs does not come
>>>> up. If I test it manually, IE kill the
>>>> application 3 times on the
>>>> master,
>>>> the switchover is smooth, same if I
>>>> kill
>>>> Postgres on master, but when
>>>> there
>>>> is a power failure on the Master, the
>>>> VIPs stay
>>>> down. If I then delete
>>>> the
>>>> attributes pgsql-data-status="LATEST"
>>>> and attributes
>>>> pgsql-data-status="STREAMING|SYNC" on
>>>> the slave
>>>> after power off on the
>>>> master and restart everything, then the
>>>> VIPs
>>>> come up on the slave, any
>>>> ideas please?
>>>> I'm using this setup
>>>>
>>>> http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster
>>>>
>>>> With this configuration below
>>>> node cl1_lb1 \
>>>> attributes
>>>> pgsql-data-status="LATEST"
>>>> node cl2_lb1 \
>>>> attributes
>>>> pgsql-data-status="STREAMING|SYNC"
>>>> primitive CBC_instance
>>>> ocf:heartbeat:cbc \
>>>> op monitor interval="60s"
>>>> timeout="60s" on-fail="restart" \
>>>> op start interval="0s"
>>>> timeout="60s"
>>>> on-fail="restart" \
>>>> meta target-role="Started"
>>>> migration-threshold="3"
>>>> failure-timeout="60s"
>>>> primitive failover_MailTo
>>>> ocf:heartbeat:MailTo \
>>>> params
>>>> email="wynandj at rorotika.com"
>>>> subject="Cluster Status
>>>> change
>>>> - " \
>>>> op monitor interval="10"
>>>> timeout="10" dept="0"
>>>> primitive pgsql ocf:heartbeat:pgsql \
>>>> params
>>>>
>>>> pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"
>>>> psql="/opt/app/PostgreSQL/9.3/bin/psql"
>>>>
>>>> config="/opt/app/pgdata/9.3/postgresql.conf"
>>>> pgdba="postgres"
>>>> pgdata="/opt/app/pgdata/9.3/"
>>>> start_opt="-p
>>>> 5432" rep_mode="sync"
>>>> node_list="cl1_lb1 cl2_lb1"
>>>> restore_command="cp
>>>> /pgtablespace/archive/%f
>>>> %p"
>>>> primary_conninfo_opt="keepalives_idle=60
>>>> keepalives_interval=5
>>>> keepalives_count=5"
>>>> master_ip="172.16.0.5"
>>>> restart_on_promote="false"
>>>> logfile="/var/log/OCF.log" \
>>>> op start interval="0s"
>>>> timeout="60s"
>>>> on-fail="restart" \
>>>> op monitor interval="4s"
>>>> timeout="60s" on-fail="restart" \
>>>> op monitor interval="3s"
>>>> role="Master" timeout="60s"
>>>> on-fail="restart" \
>>>> op promote interval="0s"
>>>> timeout="60s" on-fail="restart" \
>>>> op demote interval="0s"
>>>> timeout="60s" on-fail="stop" \
>>>> op stop interval="0s"
>>>> timeout="60s"
>>>> on-fail="block" \
>>>> op notify interval="0s"
>>>> timeout="60s"
>>>> primitive vip-master
>>>> ocf:heartbeat:IPaddr2 \
>>>> params ip="172.28.200.159"
>>>> nic="eth0" iflabel="CBC_VIP"
>>>> cidr_netmask="24" \
>>>> op start interval="0s"
>>>> timeout="60s"
>>>> on-fail="restart" \
>>>> op monitor interval="10s"
>>>> timeout="60s" on-fail="restart" \
>>>> op stop interval="0s"
>>>> timeout="60s"
>>>> on-fail="block" \
>>>> meta target-role="Started"
>>>> primitive vip-rep ocf:heartbeat:IPaddr2
>>>> \
>>>> params ip="172.16.0.5"
>>>> nic="eth1"
>>>> iflabel="REP_VIP"
>>>> cidr_netmask="24" \
>>>> meta
>>>> migration-threshold="0"
>>>> target-role="Started" \
>>>> op start interval="0s"
>>>> timeout="60s"
>>>> on-fail="stop" \
>>>> op monitor interval="10s"
>>>> timeout="60s" on-fail="restart" \
>>>> op stop interval="0s"
>>>> timeout="60s"
>>>> on-fail="restart"
>>>> group master-group vip-master vip-rep
>>>> CBC_instance failover_MailTo
>>>> ms msPostgresql pgsql \
>>>> meta master-max="1"
>>>> master-node-max="1" clone-max="2"
>>>> clone-node-max="1" notify="true"
>>>> colocation rsc_colocation-1 inf:
>>>> master-group
>>>> msPostgresql:Master
>>>> order rsc_order-1 0:
>>>> msPostgresql:promote
>>>> master-group:start
>>>> symmetrical=false
>>>> order rsc_order-2 0:
>>>> msPostgresql:demote
>>>> master-group:stop
>>>> symmetrical=false
>>>> property $id="cib-bootstrap-options" \
>>>> dc-version="1.1.9-2db99f1"
>>>> \
>>>>
>>>> cluster-infrastructure="classic
>>>> openais (with plugin)" \
>>>> expected-quorum-votes="2" \
>>>> no-quorum-policy="ignore" \
>>>> stonith-enabled="false" \
>>>>
>>>> cluster-recheck-interval="1min" \
>>>> crmd-transition-delay="0s"
>>>> \
>>>>
>>>> last-lrm-refresh="1426485983"
>>>> rsc_defaults
>>>> $id="rsc-options" \
>>>>
>>>> resource-stickiness="INFINITY" \
>>>> migration-threshold="1"
>>>> #vim:set syntax=pcmk
>>>>
>>>> Any ideas please, I'm lost......
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list:
>>>> Users at clusterlabs.org
>>>>
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home:
>>>> http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/
>>>> doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>>
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>> --
>>>> NTT オープンソースソフトウェアセンタ
>>>> 中平 和友
>>>> TEL: 03-5860-5135 FAX: 03-5463-6490
>>>> Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> --
>>>> NTT オープンソースソフトウェアセンタ
>>>> 中平 和友
>>>> TEL: 03-5860-5135 FAX: 03-5463-6490
>>>> Mail: nakahira_kazutomo_b1 at lab.ntt.co.jp
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________ Users mailing list:
>>>> Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
>>>> Project
>>>> Home: http://www.clusterlabs.org Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>>> http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list