<div dir="ltr"><div><div>Hi<br></div>Yes ok a one cluster, 4 nodes configuration, when I set symmetric-cluster=false, I get this output??!!<br>cl1_lb1:/opt/temp # crm_mon -1 -Af<br>Last updated: Thu Mar 19 12:16:20 2015<br>Last change: Thu Mar 19 12:15:22 2015 by hacluster via crmd on cl1_lb1<br>Stack: classic openais (with plugin)<br>Current DC: cl1_lb2 - partition with quorum<br>Version: 1.1.9-2db99f1<br>4 Nodes configured, 4 expected votes<br>6 Resources configured.<br><br><br>Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]<br><br><br>Node Attributes:<br>* Node cl1_lb1:<br> + pgsql-data-status : LATEST <br>* Node cl1_lb2:<br>* Node cl2_lb1:<br> + pgsql-data-status : LATEST <br>* Node cl2_lb2:<br><br>Migration summary:<br>* Node cl1_lb2: <br>* Node cl2_lb1: <br>* Node cl2_lb2: <br>* Node cl1_lb1: <br>cl1_lb1:/opt/temp # <br><br></div>No expanded pgsql-data-status anymore? I'm confused!<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 19, 2015 at 11:47 AM, Andrei Borzenkov <span dir="ltr"><<a href="mailto:arvidjaar@gmail.com" target="_blank">arvidjaar@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, Mar 19, 2015 at 12:31 PM, Wynand Jansen van Vuuren<br>
<<a href="mailto:esawyja@gmail.com">esawyja@gmail.com</a>> wrote:<br>
> Hi all,<br>
> I have a different question please, let say I have the following<br>
> 4 - nodes, 2 clusters, 2 nodes per cluster, so I have in the west of the<br>
> country Cluster 1 with cl1_lb1 and cl1_lb2 as the nodes, in the east of the<br>
> country I have Cluster 2 with cl2_lb1 and cl2_lb2 as the nodes<br>
><br>
<br>
</span>According to output you provided you have single cluster consisting of<br>
4 nodes, not two clusters of 2 nodes each.<br>
<div><div class="h5"><br>
> I have 3 different applications, Postgres, App1 and App2, App1 uses a VIP to<br>
> write to Postgres, App2 uses Apache2<br>
><br>
> Can I do the following<br>
> cl1_lb1, runs Postgres streaming with App1 VIP in Master/Slave configuration<br>
> to cl2_lb1<br>
><br>
> cl1_lb1, cl1_lb2, cl2_lb1 and cl2_lb2 all runs App2 and the VIP round robin<br>
> for the Apache page<br>
><br>
> So my question is actually this, in this configuration, in the corosync.conf<br>
> file, what would the expected_votes setting be, 2 or 4? and can you separate<br>
> the resources per node? I thought the node_list, rep_mode="sync"<br>
> node_list="cl1_lb1 cl2_lb1" in the pgsql primitive would isolate the pgsql<br>
> to run on cl1_lb1 and cl2_lb1 only, but it does not seem to be the case, as<br>
> soon as I add the other nodes to the corosync configuration, I get this<br>
> below<br>
><br>
> cl1_lb1:/opt/temp # crm_mon -1 -Af<br>
> Last updated: Thu Mar 19 11:29:16 2015<br>
> Last change: Thu Mar 19 11:10:17 2015 by hacluster via crmd on cl1_lb1<br>
> Stack: classic openais (with plugin)<br>
> Current DC: cl1_lb1 - partition with quorum<br>
> Version: 1.1.9-2db99f1<br>
> 4 Nodes configured, 4 expected votes<br>
> 6 Resources configured.<br>
><br>
><br>
> Online: [ cl1_lb1 cl1_lb2 cl2_lb1 cl2_lb2 ]<br>
><br>
><br>
> Node Attributes:<br>
> * Node cl1_lb1:<br>
> + master-pgsql : -INFINITY<br>
> + pgsql-data-status : LATEST<br>
> + pgsql-status : STOP<br>
> * Node cl1_lb2:<br>
> + pgsql-status : UNKNOWN<br>
> * Node cl2_lb1:<br>
> + master-pgsql : -INFINITY<br>
> + pgsql-data-status : LATEST<br>
> + pgsql-status : STOP<br>
> * Node cl2_lb2:<br>
> + pgsql-status : UNKNOWN<br>
><br>
> Migration summary:<br>
> * Node cl2_lb1:<br>
> pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar<br>
> 19 11:10:18 2015'<br>
> * Node cl1_lb1:<br>
> pgsql:0: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar<br>
> 19 11:10:18 2015'<br>
> * Node cl2_lb2:<br>
> * Node cl1_lb2:<br>
><br>
> Failed actions:<br>
> pgsql_start_0 (node=cl2_lb1, call=561, rc=1, status=complete): unknown<br>
> error<br>
> pgsql_start_0 (node=cl1_lb1, call=292, rc=1, status=complete): unknown<br>
> error<br>
> pgsql_start_0 (node=cl2_lb2, call=115, rc=5, status=complete): not<br>
> installed<br>
> pgsql_start_0 (node=cl1_lb2, call=73, rc=5, status=complete): not<br>
> installed<br>
> cl1_lb1:/opt/temp #<br>
><br>
> Any suggestions on how I can achieve this please ?<br>
><br>
<br>
</div></div>But it does exactly what you want - posgres won't be started on nodes<br>
cl1_lb2, cl2_lb2. If you want to get rid of probing errors, you need<br>
to either install postgres on all nodes (so agents do not fail) or set<br>
symmetric-cluster=false.<br>
<div class="HOEnZb"><div class="h5"><br>
> Regards<br>
><br>
><br>
><br>
> On Wed, Mar 18, 2015 at 7:32 AM, Wynand Jansen van Vuuren<br>
> <<a href="mailto:esawyja@gmail.com">esawyja@gmail.com</a>> wrote:<br>
>><br>
>> Hi<br>
>> Yes the problem was solved, it was the Linux Kernel that started Postgres<br>
>> when the failed server came up again, I disabled the automatic start with<br>
>> chkconfig and that solved the problem, I will take out 172.16.0.5 from the<br>
>> conf file,<br>
>> THANKS SO MUCH for all the help, I will do a blog post on how this is done<br>
>> on SLES 11 SP3 and Postgres 9.3 and will post the URL for the group, in case<br>
>> it will help someone out there, thanks again for all the help!<br>
>> Regards<br>
>><br>
>> On Wed, Mar 18, 2015 at 3:58 AM, NAKAHIRA Kazutomo<br>
>> <<a href="mailto:nakahira_kazutomo_b1@lab.ntt.co.jp">nakahira_kazutomo_b1@lab.ntt.co.jp</a>> wrote:<br>
>>><br>
>>> Hi,<br>
>>><br>
>>> As Brestan pointed out, old master can not come up as a slave is expected<br>
>>> feature.<br>
>>><br>
>>> BTW, this action is different from the original problem.<br>
>>> It seems from logs, promote action succeeded in the cl2_lb1 after power<br>
>>> off cl1_lb1.<br>
>>> Was the original problem resolved?<br>
>>><br>
>>> And cl2_lb1's postgresql.conf has the following problem.<br>
>>><br>
>>> 2015-03-17 07:34:28 SAST DETAIL: The failed archive command was: cp<br>
>>> pg_xlog/0000001D00000008000000C2<br>
>>> 172.16.0.5:/pgtablespace/archive/0000001D00000008000000C2<br>
>>><br>
>>> "172.16.0.5" must be eliminated from the archive_command directive in the<br>
>>> postgresql.conf.<br>
>>><br>
>>> Best regards,<br>
>>> Kazutomo NAKAHIRA<br>
>>><br>
>>> On 2015/03/18 5:00, Rainer Brestan wrote:<br>
>>>><br>
>>>> Yes, thats the expected behaviour.<br>
>>>> Takatoshi Matsuo describes in his papers, why a former master cant come<br>
>>>> up as<br>
>>>> slave without possible data corruption.<br>
>>>> And you do not get an indication from Postgres that the data on disk is<br>
>>>> corrupted.<br>
>>>> Therefore, he created the lock file mechanism to prevent a former master<br>
>>>> to<br>
>>>> start up.<br>
>>>> Making the base backup from Master discards any possibly wrong data from<br>
>>>> the<br>
>>>> slave and the removed lock files indicates this for the resource agent.<br>
>>>> To shorten the discussion about "how this can be automated within the<br>
>>>> resource<br>
>>>> agent", there is no clean way of handling this with very large<br>
>>>> databases, for<br>
>>>> which this can take hours.<br>
>>>> And what you should do is making the base backup in a temporary<br>
>>>> directory and<br>
>>>> then renaming this to the name Postgres instance requires after base<br>
>>>> backup<br>
>>>> finish successful (yes, this requires twice of harddisk space).<br>
>>>> Otherwise you<br>
>>>> might loose everything, when your master brakes during base backup.<br>
>>>> Rainer<br>
>>>> *Gesendet:* Dienstag, 17. März 2015 um 12:16 Uhr<br>
>>>> *Von:* "Wynand Jansen van Vuuren" <<a href="mailto:esawyja@gmail.com">esawyja@gmail.com</a>><br>
>>>> *An:* "Cluster Labs - All topics related to open-source clustering<br>
>>>> welcomed"<br>
>>>> <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>><br>
>>>> *Betreff:* Re: [ClusterLabs] Postgres streaming VIP-REP not coming up on<br>
>>>> slave<br>
>>>><br>
>>>> Hi<br>
>>>> Ok I found this particular problem, when the failed node comes up again,<br>
>>>> the<br>
>>>> kernel start Postgres, I have disabled this and now the VIPs and<br>
>>>> Postgres remain<br>
>>>> on the new MASTER, but the failed node does not come up as a slave, IE<br>
>>>> there is<br>
>>>> no sync between the new master and slave, is this the expected behavior?<br>
>>>> The<br>
>>>> only way I can get it back into slave mode is to follow the procedure in<br>
>>>> the wiki<br>
>>>><br>
>>>> # su - postgres<br>
>>>> $ rm -rf /var/lib/pgsql/data/<br>
>>>> $ pg_basebackup -h 192.168.2.3 -U postgres -D /var/lib/pgsql/data -X<br>
>>>> stream -P<br>
>>>> $ rm /var/lib/pgsql/tmp/PGSQL.lock<br>
>>>> $ exit<br>
>>>> # pcs resource cleanup msPostgresql<br>
>>>><br>
>>>> Looking forward to your reply please<br>
>>>> Regards<br>
>>>> On Tue, Mar 17, 2015 at 7:55 AM, Wynand Jansen van Vuuren<br>
>>>> <<a href="mailto:esawyja@gmail.com">esawyja@gmail.com</a>><br>
>>>> wrote:<br>
>>>><br>
>>>> Hi Nakahira,<br>
>>>> I finally got around testing this, below is the initial state<br>
>>>><br>
>>>> cl1_lb1:~ # crm_mon -1 -Af<br>
>>>> Last updated: Tue Mar 17 07:31:58 2015<br>
>>>> Last change: Tue Mar 17 07:31:12 2015 by root via crm_attribute on<br>
>>>> cl1_lb1<br>
>>>> Stack: classic openais (with plugin)<br>
>>>> Current DC: cl1_lb1 - partition with quorum<br>
>>>> Version: 1.1.9-2db99f1<br>
>>>> 2 Nodes configured, 2 expected votes<br>
>>>> 6 Resources configured.<br>
>>>><br>
>>>><br>
>>>> Online: [ cl1_lb1 cl2_lb1 ]<br>
>>>><br>
>>>> Resource Group: master-group<br>
>>>> vip-master (ocf::heartbeat:IPaddr2): Started cl1_lb1<br>
>>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl1_lb1<br>
>>>> CBC_instance (ocf::heartbeat:cbc): Started cl1_lb1<br>
>>>> failover_MailTo (ocf::heartbeat:MailTo): Started<br>
>>>> cl1_lb1<br>
>>>> Master/Slave Set: msPostgresql [pgsql]<br>
>>>> Masters: [ cl1_lb1 ]<br>
>>>> Slaves: [ cl2_lb1 ]<br>
>>>><br>
>>>> Node Attributes:<br>
>>>> * Node cl1_lb1:<br>
>>>> + master-pgsql : 1000<br>
>>>> + pgsql-data-status : LATEST<br>
>>>> + pgsql-master-baseline : 00000008BE000000<br>
>>>> + pgsql-status : PRI<br>
>>>> * Node cl2_lb1:<br>
>>>> + master-pgsql : 100<br>
>>>> + pgsql-data-status : STREAMING|SYNC<br>
>>>> + pgsql-status : HS:sync<br>
>>>><br>
>>>> Migration summary:<br>
>>>> * Node cl2_lb1:<br>
>>>> * Node cl1_lb1:<br>
>>>> cl1_lb1:~ #<br>
>>>> ###### - I then did a init 0 on the master node, cl1_lb1<br>
>>>><br>
>>>> cl1_lb1:~ # init 0<br>
>>>> cl1_lb1:~ #<br>
>>>> Connection closed by foreign host.<br>
>>>><br>
>>>> Disconnected from remote host(cl1_lb1) at 07:36:18.<br>
>>>><br>
>>>> Type `help' to learn how to use Xshell prompt.<br>
>>>> [c:\~]$<br>
>>>> ###### - This was ok as the slave took over, became master<br>
>>>><br>
>>>> cl2_lb1:~ # crm_mon -1 -Af<br>
>>>> Last updated: Tue Mar 17 07:35:04 2015<br>
>>>> Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on<br>
>>>> cl2_lb1<br>
>>>> Stack: classic openais (with plugin)<br>
>>>> Current DC: cl2_lb1 - partition WITHOUT quorum<br>
>>>> Version: 1.1.9-2db99f1<br>
>>>> 2 Nodes configured, 2 expected votes<br>
>>>> 6 Resources configured.<br>
>>>><br>
>>>><br>
>>>> Online: [ cl2_lb1 ]<br>
>>>> OFFLINE: [ cl1_lb1 ]<br>
>>>><br>
>>>> Resource Group: master-group<br>
>>>> vip-master (ocf::heartbeat:IPaddr2): Started cl2_lb1<br>
>>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl2_lb1<br>
>>>> CBC_instance (ocf::heartbeat:cbc): Started cl2_lb1<br>
>>>> failover_MailTo (ocf::heartbeat:MailTo): Started<br>
>>>> cl2_lb1<br>
>>>> Master/Slave Set: msPostgresql [pgsql]<br>
>>>> Masters: [ cl2_lb1 ]<br>
>>>> Stopped: [ pgsql:1 ]<br>
>>>><br>
>>>> Node Attributes:<br>
>>>> * Node cl2_lb1:<br>
>>>> + master-pgsql : 1000<br>
>>>> + pgsql-data-status : LATEST<br>
>>>> + pgsql-master-baseline : 00000008C2000090<br>
>>>> + pgsql-status : PRI<br>
>>>><br>
>>>> Migration summary:<br>
>>>> * Node cl2_lb1:<br>
>>>> cl2_lb1:~ #<br>
>>>> And the logs from Postgres and Corosync are attached<br>
>>>> ###### - I then restarted the original Master cl1_lb1 and started<br>
>>>> Corosync<br>
>>>> manually<br>
>>>> Once the original Master cl1_lb1 was up and Corosync running, the<br>
>>>> status<br>
>>>> below happened, notice no VIPs and Postgres<br>
>>>> ###### - Still working below<br>
>>>><br>
>>>> cl2_lb1:~ # crm_mon -1 -Af<br>
>>>> Last updated: Tue Mar 17 07:36:55 2015<br>
>>>> Last change: Tue Mar 17 07:34:29 2015 by root via crm_attribute on<br>
>>>> cl2_lb1<br>
>>>> Stack: classic openais (with plugin)<br>
>>>> Current DC: cl2_lb1 - partition WITHOUT quorum<br>
>>>> Version: 1.1.9-2db99f1<br>
>>>> 2 Nodes configured, 2 expected votes<br>
>>>> 6 Resources configured.<br>
>>>><br>
>>>><br>
>>>> Online: [ cl2_lb1 ]<br>
>>>> OFFLINE: [ cl1_lb1 ]<br>
>>>><br>
>>>> Resource Group: master-group<br>
>>>> vip-master (ocf::heartbeat:IPaddr2): Started cl2_lb1<br>
>>>> vip-rep (ocf::heartbeat:IPaddr2): Started cl2_lb1<br>
>>>> CBC_instance (ocf::heartbeat:cbc): Started cl2_lb1<br>
>>>> failover_MailTo (ocf::heartbeat:MailTo): Started<br>
>>>> cl2_lb1<br>
>>>> Master/Slave Set: msPostgresql [pgsql]<br>
>>>> Masters: [ cl2_lb1 ]<br>
>>>> Stopped: [ pgsql:1 ]<br>
>>>><br>
>>>> Node Attributes:<br>
>>>> * Node cl2_lb1:<br>
>>>> + master-pgsql : 1000<br>
>>>> + pgsql-data-status : LATEST<br>
>>>> + pgsql-master-baseline : 00000008C2000090<br>
>>>> + pgsql-status : PRI<br>
>>>><br>
>>>> Migration summary:<br>
>>>> * Node cl2_lb1:<br>
>>>><br>
>>>> ###### - After original master is up and Corosync running on<br>
>>>> cl1_lb1<br>
>>>><br>
>>>> cl2_lb1:~ # crm_mon -1 -Af<br>
>>>> Last updated: Tue Mar 17 07:37:47 2015<br>
>>>> Last change: Tue Mar 17 07:37:21 2015 by root via crm_attribute on<br>
>>>> cl1_lb1<br>
>>>> Stack: classic openais (with plugin)<br>
>>>> Current DC: cl2_lb1 - partition with quorum<br>
>>>> Version: 1.1.9-2db99f1<br>
>>>> 2 Nodes configured, 2 expected votes<br>
>>>> 6 Resources configured.<br>
>>>><br>
>>>><br>
>>>> Online: [ cl1_lb1 cl2_lb1 ]<br>
>>>><br>
>>>><br>
>>>> Node Attributes:<br>
>>>> * Node cl1_lb1:<br>
>>>> + master-pgsql : -INFINITY<br>
>>>> + pgsql-data-status : LATEST<br>
>>>> + pgsql-status : STOP<br>
>>>> * Node cl2_lb1:<br>
>>>> + master-pgsql : -INFINITY<br>
>>>> + pgsql-data-status : DISCONNECT<br>
>>>> + pgsql-status : STOP<br>
>>>><br>
>>>> Migration summary:<br>
>>>> * Node cl2_lb1:<br>
>>>> pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue<br>
>>>> Mar 17<br>
>>>> 07:37:26 2015'<br>
>>>> * Node cl1_lb1:<br>
>>>> pgsql:0: migration-threshold=1 fail-count=2 last-failure='Tue<br>
>>>> Mar 17<br>
>>>> 07:37:26 2015'<br>
>>>><br>
>>>> Failed actions:<br>
>>>> pgsql_monitor_4000 (node=cl2_lb1, call=735, rc=7,<br>
>>>> status=complete): not<br>
>>>> running<br>
>>>> pgsql_monitor_4000 (node=cl1_lb1, call=42, rc=7,<br>
>>>> status=complete): not<br>
>>>> running<br>
>>>> cl2_lb1:~ #<br>
>>>> ##### - No VIPs up<br>
>>>><br>
>>>> cl2_lb1:~ # ping 172.28.200.159<br>
>>>> PING 172.28.200.159 (172.28.200.159) 56(84) bytes of data.<br>
>>>> >From 172.28.200.168 <<a href="http://172.28.200.168" target="_blank">http://172.28.200.168</a>>: icmp_seq=1<br>
>>>> Destination Host<br>
>>>> Unreachable<br>
>>>> >From 172.28.200.168 icmp_seq=1 Destination Host Unreachable<br>
>>>> >From 172.28.200.168 icmp_seq=2 Destination Host Unreachable<br>
>>>> >From 172.28.200.168 icmp_seq=3 Destination Host Unreachable<br>
>>>> ^C<br>
>>>> --- 172.28.200.159 ping statistics ---<br>
>>>> 5 packets transmitted, 0 received, +4 errors, 100% packet loss,<br>
>>>> time 4024ms<br>
>>>> , pipe 3<br>
>>>> cl2_lb1:~ # ping 172.16.0.5<br>
>>>> PING 172.16.0.5 (172.16.0.5) 56(84) bytes of data.<br>
>>>> >From 172.16.0.3 <<a href="http://172.16.0.3" target="_blank">http://172.16.0.3</a>>: icmp_seq=1 Destination Host<br>
>>>> Unreachable<br>
>>>><br>
>>>> >From 172.16.0.3 icmp_seq=1 Destination Host Unreachable<br>
>>>> >From 172.16.0.3 icmp_seq=2 Destination Host Unreachable<br>
>>>> >From 172.16.0.3 icmp_seq=3 Destination Host Unreachable<br>
>>>> >From 172.16.0.3 icmp_seq=5 Destination Host Unreachable<br>
>>>> >From 172.16.0.3 icmp_seq=6 Destination Host Unreachable<br>
>>>> >From 172.16.0.3 icmp_seq=7 Destination Host Unreachable<br>
>>>> ^C<br>
>>>> --- 172.16.0.5 ping statistics ---<br>
>>>> 8 packets transmitted, 0 received, +7 errors, 100% packet loss,<br>
>>>> time 7015ms<br>
>>>> , pipe 3<br>
>>>> cl2_lb1:~ #<br>
>>>><br>
>>>> Any ideas please, or it it a case of recovering the original master<br>
>>>> manually<br>
>>>> before starting Corosync etc?<br>
>>>> All logs are attached<br>
>>>> Regards<br>
>>>> On Mon, Mar 16, 2015 at 11:01 AM, Wynand Jansen van Vuuren<br>
>>>> <<a href="mailto:esawyja@gmail.com">esawyja@gmail.com</a>> wrote:<br>
>>>><br>
>>>> Thanks for the advice, I have a demo on this now, so I don't<br>
>>>> want to<br>
>>>> test this now, I will do so tomorrow and forwards the logs,<br>
>>>> many thanks!!<br>
>>>> On Mon, Mar 16, 2015 at 10:54 AM, NAKAHIRA Kazutomo<br>
>>>> <<a href="mailto:nakahira_kazutomo_b1@lab.ntt.co.jp">nakahira_kazutomo_b1@lab.ntt.co.jp</a>> wrote:<br>
>>>><br>
>>>> Hi,<br>
>>>><br>
>>>> > do you suggest that I take it out? or should I look at<br>
>>>> the problem where<br>
>>>> > cl2_lb1 is not being promoted?<br>
>>>><br>
>>>> You should look at the problem where cl2_lb1 is not being<br>
>>>> promoted.<br>
>>>> And I look it if you send me a ha-log and PostgreSQL's log.<br>
>>>><br>
>>>> Best regards,<br>
>>>> Kazutomo NAKAHIRA<br>
>>>><br>
>>>><br>
>>>> On 2015/03/16 17:18, Wynand Jansen van Vuuren wrote:<br>
>>>><br>
>>>> Hi Nakahira,<br>
>>>> Thanks so much for the info, this setting was as the<br>
>>>> wiki page<br>
>>>> suggested,<br>
>>>> do you suggest that I take it out? or should I look at<br>
>>>> the<br>
>>>> problem where<br>
>>>> cl2_lb1 is not being promoted?<br>
>>>> Regards<br>
>>>><br>
>>>> On Mon, Mar 16, 2015 at 10:15 AM, NAKAHIRA Kazutomo <<br>
>>>> <a href="mailto:nakahira_kazutomo_b1@lab.ntt.co.jp">nakahira_kazutomo_b1@lab.ntt.co.jp</a>> wrote:<br>
>>>><br>
>>>> Hi,<br>
>>>><br>
>>>> Notice there is no VIPs, looks like the VIPs<br>
>>>> depends on<br>
>>>> some other<br>
>>>><br>
>>>> resource<br>
>>>><br>
>>>> to start 1st?<br>
>>>><br>
>>>><br>
>>>> The following constraints means that "master-group"<br>
>>>> can not<br>
>>>> start<br>
>>>> without master of msPostgresql resource.<br>
>>>><br>
>>>> colocation rsc_colocation-1 inf: master-group<br>
>>>> msPostgresql:Master<br>
>>>><br>
>>>> After you power off cl1_lb1, msPostgresql on the<br>
>>>> cl2_lb1 is<br>
>>>> not promoted<br>
>>>> and master is not exist in your cluster.<br>
>>>><br>
>>>> It means that "master-group" can not run anyware.<br>
>>>><br>
>>>> Best regards,<br>
>>>> Kazutomo NAKAHIRA<br>
>>>><br>
>>>><br>
>>>> On 2015/03/16 16:48, Wynand Jansen van Vuuren<br>
>>>> wrote:<br>
>>>><br>
>>>> Hi<br>
>>>> When I start out cl1_lb1 (Cluster 1 load<br>
>>>> balancer 1) is<br>
>>>> the master as<br>
>>>> below<br>
>>>> cl1_lb1:~ # crm_mon -1 -Af<br>
>>>> Last updated: Mon Mar 16 09:44:44 2015<br>
>>>> Last change: Mon Mar 16 08:06:26 2015 by root<br>
>>>> via<br>
>>>> crm_attribute on cl1_lb1<br>
>>>> Stack: classic openais (with plugin)<br>
>>>> Current DC: cl2_lb1 - partition with quorum<br>
>>>> Version: 1.1.9-2db99f1<br>
>>>> 2 Nodes configured, 2 expected votes<br>
>>>> 6 Resources configured.<br>
>>>><br>
>>>><br>
>>>> Online: [ cl1_lb1 cl2_lb1 ]<br>
>>>><br>
>>>> Resource Group: master-group<br>
>>>> vip-master (ocf::heartbeat:IPaddr2):<br>
>>>> Started cl1_lb1<br>
>>>> vip-rep (ocf::heartbeat:IPaddr2):<br>
>>>> Started<br>
>>>> cl1_lb1<br>
>>>> CBC_instance (ocf::heartbeat:cbc):<br>
>>>> Started<br>
>>>> cl1_lb1<br>
>>>> failover_MailTo<br>
>>>> (ocf::heartbeat:MailTo):<br>
>>>> Started cl1_lb1<br>
>>>> Master/Slave Set: msPostgresql [pgsql]<br>
>>>> Masters: [ cl1_lb1 ]<br>
>>>> Slaves: [ cl2_lb1 ]<br>
>>>><br>
>>>> Node Attributes:<br>
>>>> * Node cl1_lb1:<br>
>>>> + master-pgsql :<br>
>>>> 1000<br>
>>>> + pgsql-data-status :<br>
>>>> LATEST<br>
>>>> + pgsql-master-baseline :<br>
>>>> 00000008B90061F0<br>
>>>> + pgsql-status :<br>
>>>> PRI<br>
>>>> * Node cl2_lb1:<br>
>>>> + master-pgsql :<br>
>>>> 100<br>
>>>> + pgsql-data-status :<br>
>>>> STREAMING|SYNC<br>
>>>> + pgsql-status :<br>
>>>> HS:sync<br>
>>>><br>
>>>> Migration summary:<br>
>>>> * Node cl2_lb1:<br>
>>>> * Node cl1_lb1:<br>
>>>> cl1_lb1:~ #<br>
>>>><br>
>>>> If I then do a power off on cl1_lb1 (master),<br>
>>>> Postgres<br>
>>>> moves to cl2_lb1<br>
>>>> (Cluster 2 load balancer 1), but the VIP-MASTER<br>
>>>> and<br>
>>>> VIP-REP is not<br>
>>>> pingable<br>
>>>> from the NEW master (cl2_lb1), it stays line<br>
>>>> this below<br>
>>>> cl2_lb1:~ # crm_mon -1 -Af<br>
>>>> Last updated: Mon Mar 16 07:32:07 2015<br>
>>>> Last change: Mon Mar 16 07:28:53 2015 by root<br>
>>>> via<br>
>>>> crm_attribute on cl1_lb1<br>
>>>> Stack: classic openais (with plugin)<br>
>>>> Current DC: cl2_lb1 - partition WITHOUT quorum<br>
>>>> Version: 1.1.9-2db99f1<br>
>>>> 2 Nodes configured, 2 expected votes<br>
>>>> 6 Resources configured.<br>
>>>><br>
>>>><br>
>>>> Online: [ cl2_lb1 ]<br>
>>>> OFFLINE: [ cl1_lb1 ]<br>
>>>><br>
>>>> Master/Slave Set: msPostgresql [pgsql]<br>
>>>> Slaves: [ cl2_lb1 ]<br>
>>>> Stopped: [ pgsql:1 ]<br>
>>>><br>
>>>> Node Attributes:<br>
>>>> * Node cl2_lb1:<br>
>>>> + master-pgsql :<br>
>>>> -INFINITY<br>
>>>> + pgsql-data-status :<br>
>>>> DISCONNECT<br>
>>>> + pgsql-status :<br>
>>>> HS:alone<br>
>>>><br>
>>>> Migration summary:<br>
>>>> * Node cl2_lb1:<br>
>>>> cl2_lb1:~ #<br>
>>>><br>
>>>> Notice there is no VIPs, looks like the VIPs<br>
>>>> depends on<br>
>>>> some other<br>
>>>> resource<br>
>>>> to start 1st?<br>
>>>> Thanks for the reply!<br>
>>>><br>
>>>><br>
>>>> On Mon, Mar 16, 2015 at 9:42 AM, NAKAHIRA<br>
>>>> Kazutomo <<br>
>>>> <a href="mailto:nakahira_kazutomo_b1@lab.ntt.co.jp">nakahira_kazutomo_b1@lab.ntt.co.jp</a>> wrote:<br>
>>>><br>
>>>> Hi,<br>
>>>><br>
>>>><br>
>>>> fine, cl2_lb1 takes over and acts as a<br>
>>>> slave, but<br>
>>>> the VIPs does not come<br>
>>>><br>
>>>><br>
>>>> cl2_lb1 acts as a slave? It is not a<br>
>>>> master?<br>
>>>> VIPs comes up with master msPostgresql<br>
>>>> resource.<br>
>>>><br>
>>>> If promote action was failed in the<br>
>>>> cl2_lb1, then<br>
>>>> please send a ha-log and PostgreSQL's log.<br>
>>>><br>
>>>> Best regards,<br>
>>>> Kazutomo NAKAHIRA<br>
>>>><br>
>>>><br>
>>>> On 2015/03/16 16:09, Wynand Jansen van<br>
>>>> Vuuren wrote:<br>
>>>><br>
>>>> Hi all,<br>
>>>><br>
>>>><br>
>>>> I have 2 nodes, with 2 interfaces each,<br>
>>>> ETH0 is<br>
>>>> used for an application,<br>
>>>> CBC, that's writing to the Postgres DB<br>
>>>> on the<br>
>>>> VIP-MASTER 172.28.200.159,<br>
>>>> ETH1 is used for the Corosync<br>
>>>> configuration and<br>
>>>> VIP-REP, everything<br>
>>>> works,<br>
>>>> but if the master currently on cl1_lb1<br>
>>>> has a<br>
>>>> catastrophic failure, like<br>
>>>> power down, the VIPs does not start on<br>
>>>> the<br>
>>>> slave, the Postgres parts<br>
>>>> works<br>
>>>> fine, cl2_lb1 takes over and acts as a<br>
>>>> slave,<br>
>>>> but the VIPs does not come<br>
>>>> up. If I test it manually, IE kill the<br>
>>>> application 3 times on the<br>
>>>> master,<br>
>>>> the switchover is smooth, same if I<br>
>>>> kill<br>
>>>> Postgres on master, but when<br>
>>>> there<br>
>>>> is a power failure on the Master, the<br>
>>>> VIPs stay<br>
>>>> down. If I then delete<br>
>>>> the<br>
>>>> attributes pgsql-data-status="LATEST"<br>
>>>> and attributes<br>
>>>> pgsql-data-status="STREAMING|SYNC" on<br>
>>>> the slave<br>
>>>> after power off on the<br>
>>>> master and restart everything, then the<br>
>>>> VIPs<br>
>>>> come up on the slave, any<br>
>>>> ideas please?<br>
>>>> I'm using this setup<br>
>>>><br>
>>>> <a href="http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster" target="_blank">http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster</a><br>
>>>><br>
>>>> With this configuration below<br>
>>>> node cl1_lb1 \<br>
>>>> attributes<br>
>>>> pgsql-data-status="LATEST"<br>
>>>> node cl2_lb1 \<br>
>>>> attributes<br>
>>>> pgsql-data-status="STREAMING|SYNC"<br>
>>>> primitive CBC_instance<br>
>>>> ocf:heartbeat:cbc \<br>
>>>> op monitor interval="60s"<br>
>>>> timeout="60s" on-fail="restart" \<br>
>>>> op start interval="0s"<br>
>>>> timeout="60s"<br>
>>>> on-fail="restart" \<br>
>>>> meta target-role="Started"<br>
>>>> migration-threshold="3"<br>
>>>> failure-timeout="60s"<br>
>>>> primitive failover_MailTo<br>
>>>> ocf:heartbeat:MailTo \<br>
>>>> params<br>
>>>> email="<a href="mailto:wynandj@rorotika.com">wynandj@rorotika.com</a>"<br>
>>>> subject="Cluster Status<br>
>>>> change<br>
>>>> - " \<br>
>>>> op monitor interval="10"<br>
>>>> timeout="10" dept="0"<br>
>>>> primitive pgsql ocf:heartbeat:pgsql \<br>
>>>> params<br>
>>>><br>
>>>> pgctl="/opt/app/PostgreSQL/9.3/bin/pg_ctl"<br>
>>>> psql="/opt/app/PostgreSQL/9.3/bin/psql"<br>
>>>><br>
>>>> config="/opt/app/pgdata/9.3/postgresql.conf"<br>
>>>> pgdba="postgres"<br>
>>>> pgdata="/opt/app/pgdata/9.3/"<br>
>>>> start_opt="-p<br>
>>>> 5432" rep_mode="sync"<br>
>>>> node_list="cl1_lb1 cl2_lb1"<br>
>>>> restore_command="cp<br>
>>>> /pgtablespace/archive/%f<br>
>>>> %p"<br>
>>>> primary_conninfo_opt="keepalives_idle=60<br>
>>>> keepalives_interval=5<br>
>>>> keepalives_count=5"<br>
>>>> master_ip="172.16.0.5"<br>
>>>> restart_on_promote="false"<br>
>>>> logfile="/var/log/OCF.log" \<br>
>>>> op start interval="0s"<br>
>>>> timeout="60s"<br>
>>>> on-fail="restart" \<br>
>>>> op monitor interval="4s"<br>
>>>> timeout="60s" on-fail="restart" \<br>
>>>> op monitor interval="3s"<br>
>>>> role="Master" timeout="60s"<br>
>>>> on-fail="restart" \<br>
>>>> op promote interval="0s"<br>
>>>> timeout="60s" on-fail="restart" \<br>
>>>> op demote interval="0s"<br>
>>>> timeout="60s" on-fail="stop" \<br>
>>>> op stop interval="0s"<br>
>>>> timeout="60s"<br>
>>>> on-fail="block" \<br>
>>>> op notify interval="0s"<br>
>>>> timeout="60s"<br>
>>>> primitive vip-master<br>
>>>> ocf:heartbeat:IPaddr2 \<br>
>>>> params ip="172.28.200.159"<br>
>>>> nic="eth0" iflabel="CBC_VIP"<br>
>>>> cidr_netmask="24" \<br>
>>>> op start interval="0s"<br>
>>>> timeout="60s"<br>
>>>> on-fail="restart" \<br>
>>>> op monitor interval="10s"<br>
>>>> timeout="60s" on-fail="restart" \<br>
>>>> op stop interval="0s"<br>
>>>> timeout="60s"<br>
>>>> on-fail="block" \<br>
>>>> meta target-role="Started"<br>
>>>> primitive vip-rep ocf:heartbeat:IPaddr2<br>
>>>> \<br>
>>>> params ip="172.16.0.5"<br>
>>>> nic="eth1"<br>
>>>> iflabel="REP_VIP"<br>
>>>> cidr_netmask="24" \<br>
>>>> meta<br>
>>>> migration-threshold="0"<br>
>>>> target-role="Started" \<br>
>>>> op start interval="0s"<br>
>>>> timeout="60s"<br>
>>>> on-fail="stop" \<br>
>>>> op monitor interval="10s"<br>
>>>> timeout="60s" on-fail="restart" \<br>
>>>> op stop interval="0s"<br>
>>>> timeout="60s"<br>
>>>> on-fail="restart"<br>
>>>> group master-group vip-master vip-rep<br>
>>>> CBC_instance failover_MailTo<br>
>>>> ms msPostgresql pgsql \<br>
>>>> meta master-max="1"<br>
>>>> master-node-max="1" clone-max="2"<br>
>>>> clone-node-max="1" notify="true"<br>
>>>> colocation rsc_colocation-1 inf:<br>
>>>> master-group<br>
>>>> msPostgresql:Master<br>
>>>> order rsc_order-1 0:<br>
>>>> msPostgresql:promote<br>
>>>> master-group:start<br>
>>>> symmetrical=false<br>
>>>> order rsc_order-2 0:<br>
>>>> msPostgresql:demote<br>
>>>> master-group:stop<br>
>>>> symmetrical=false<br>
>>>> property $id="cib-bootstrap-options" \<br>
>>>> dc-version="1.1.9-2db99f1"<br>
>>>> \<br>
>>>><br>
>>>> cluster-infrastructure="classic<br>
>>>> openais (with plugin)" \<br>
>>>> expected-quorum-votes="2" \<br>
>>>> no-quorum-policy="ignore" \<br>
>>>> stonith-enabled="false" \<br>
>>>><br>
>>>> cluster-recheck-interval="1min" \<br>
>>>> crmd-transition-delay="0s"<br>
>>>> \<br>
>>>><br>
>>>> last-lrm-refresh="1426485983"<br>
>>>> rsc_defaults<br>
>>>> $id="rsc-options" \<br>
>>>><br>
>>>> resource-stickiness="INFINITY" \<br>
>>>> migration-threshold="1"<br>
>>>> #vim:set syntax=pcmk<br>
>>>><br>
>>>> Any ideas please, I'm lost......<br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list:<br>
>>>> <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>>><br>
>>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>><br>
>>>> Project Home:<br>
>>>> <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>>> Getting started:<br>
>>>> <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org/</a><br>
>>>> doc/Cluster_from_Scratch.pdf<br>
>>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>>><br>
>>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>><br>
>>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>>> Getting started:<br>
>>>><br>
>>>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>><br>
>>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>>> Getting started:<br>
>>>><br>
>>>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>>><br>
>>>> --<br>
>>>> NTT オープンソースソフトウェアセンタ<br>
>>>> 中平 和友<br>
>>>> TEL: 03-5860-5135 FAX: 03-5463-6490<br>
>>>> Mail: <a href="mailto:nakahira_kazutomo_b1@lab.ntt.co.jp">nakahira_kazutomo_b1@lab.ntt.co.jp</a><br>
>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>><br>
>>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>>> Getting started:<br>
>>>><br>
>>>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>><br>
>>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>>> Getting started:<br>
>>>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>>><br>
>>>><br>
>>>> --<br>
>>>> NTT オープンソースソフトウェアセンタ<br>
>>>> 中平 和友<br>
>>>> TEL: 03-5860-5135 FAX: 03-5463-6490<br>
>>>> Mail: <a href="mailto:nakahira_kazutomo_b1@lab.ntt.co.jp">nakahira_kazutomo_b1@lab.ntt.co.jp</a><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>><br>
>>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>>> Getting started:<br>
>>>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>>> _______________________________________________ Users mailing list:<br>
>>>> <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>> Project<br>
>>>> Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a> Getting started:<br>
>>>> <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a> Bugs:<br>
>>>> <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>>><br>
>>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>>> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>>>><br>
>>><br>
>>><br>
>>><br>
>>> _______________________________________________<br>
>>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
>>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
>>><br>
>>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
>>> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
>>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
>><br>
>><br>
><br>
><br>
> _______________________________________________<br>
> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
<br>
_______________________________________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br></div>