[Pacemaker] Postgresql streaming replication failover - RA needed
Attila Megyeri
amegyeri at minerva-soft.com
Fri Nov 25 14:15:18 UTC 2011
Hi Yoshiharu,
-----Original Message-----
From: Yoshiharu Mori [mailto:y-mori at sraoss.co.jp]
Sent: 2011. november 25. 14:17
To: The Pacemaker cluster resource manager
Cc: Attila Megyeri
Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed
Hi Attila
> A quick snippet from the corosync.log
>
> Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master.
> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=.
> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location :
> 000000000D000000 Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog
> location : 0000000008000000
>
> As you see, the "my data status" returns an empty string.
My log is same. but it works.
Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Master is not exist.
Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Checking right of master.
Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: My data status=.
Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm01 xlog location : 0000000005000020 Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm02 xlog location : 0000000005000000
In my log, the following logs are outputted and started after checking xlog location(3 times).
Nov 18 19:29:39 osspc24-1 pgsql[18720]: INFO: I have a master right.
Please show us more corosync.log.
===
I can leave it run forever, but will never show "I have a master right".
To be honest, I have no idea what should promote the node to master.
What is it that the RA checks, and what could be wrong? I just cannot find where the problem is.
Right now I am running corosync on node 1 only, as I expect that this way it will have the most recent xlog and start as a master.
But it never starts.
Here is the output for crm_mon -A :
============
Last updated: Fri Nov 25 13:52:58 2011
Stack: openais
Current DC: psql1 - partition WITHOUT quorum
Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ psql1 ]
OFFLINE: [ psql2 ]
Master/Slave Set: msPostgresql [postgresql]
Slaves: [ psql1 ]
Stopped: [ postgresql:1 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ psql1 ]
Stopped: [ pingCheck:1 ]
Node Attributes:
* Node psql1:
+ default_ping_set : 100
+ master-postgresql:0 : -INFINITY
+ pgsql-status : HS:alone
+ pgsql-xlog-loc : 0000000012000000
I sent the log directly in private not to overload the list. I did a "resource stop msPostgresql" and "resource start msPostgresql" around 13:52.
You will see some extra debug messages starting with "ATT" - I added them to the RA to help my troubleshooting.
Thank you for your help,
Attila
>
>
> -----Original Message-----
> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
> Sent: 2011. november 25. 9:28
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
> RA needed
>
> Hi Takatoshi,
>
> I have restored the PSQL to run without corosync so I cannot send you the crm_mon output now.
>
> What I can tell for sure:
> - RA never promoted any of the nodes, no matter what the status was. It also did not promote the node, when it was the only one.
> - I believe the issue is in the comparison of the xlogs. How could I troubleshoot that? I see from the logs that crm NEVER tried to invoke pgsql with "promote"
> - I tried previously the crm_mon -A option, but there was never a "
> pgsql-data-status" attribute. The other attribs were there, including
> the HS:alone
> - In the corosync log the only relevant RA message I see is " Master is not exist. " I never saw a message like "My data is out-of-date"
>
> Thank you!
>
> Attila
>
>
> -----Original Message-----
> From: Takatoshi MATSUO [mailto:matsuo.tak at gmail.com]
> Sent: 2011. november 25. 8:56
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
> RA needed
>
> Hi Attila
>
> 2011/11/24 Attila Megyeri <amegyeri at minerva-soft.com>:
> > Hi Takatoshi, All,
> >
> > Thanks for your reply.
> > I see that you have invested significant effort in the development of the RA. I spent the last day trying to set up the RA, but without much success.
> >
> > My infrastructure is very similar to yours, except for the fact that currently I am testing with a single network adapter.
> >
> > Replication works nicely when I start the databases manually, not using corosync.
> >
> > When I try to start using corosync,I see that the ping resources start normally, but the msPostgresql starts on both nodes in slave mode, and I see "HS:alone"
>
> To see "HS:alone" is normal.
> And RA compares xlog locations and promote the postgresql having new data.
>
> > In the Wiki you state, the if I start on a signle node only, PSQL should start in Master mode (PRI), but this is not the case.
>
> If the data is old, the node can't be master.
> To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC".
> Plese check it using "crm_mon -A".
>
>
>
>
> And to become a master from stopped takes a few minutes because the RA compares xlog location on monitor.
>
>
> > The recovery.conf file is created immediately, and from the logs I see no attempt at all to promote the node.
> > In the postgres logs I see that node1, which is supposed to be a master, tries to connect to the vip-rep IP address, which is NOT brought up, because it depends on the Master role...
> >
> > Do you have any idea?
>
> Please check HA log.
> My RA outputs "My data is out-of-date. status=********" to log if the data is old.
>
> Regards,
> Takatoshi MATSUO
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
--
Yoshiharu Mori <y-mori at sraoss.co.jp>
SRA OSS, Inc Japan http://www.sraoss.co.jp
TEL: 03-5979-2701
FAX: 03-5979-2702
More information about the Pacemaker
mailing list