[Pacemaker] Postgresql streaming replication failover - RA needed

Sun Nov 27 16:31:27 UTC 2011

Hi Takatoshi,

I don't think it is inconsistency problem - for me it looks like some RA bug.
I think so, because postgres starts properly outside pacemaker.

When pacemaker starts node psql1 I see only:

postgresql:0_start_0 (node=psql1, call=9, rc=1, status=complete): unknown error

and the postgres log is empty - so I suppose that it does not even try to start it.

What I tested was:
- I had a stable cluster, where psql1 was the master, psql2 was the slave
- I put psql1 into standby mode. ("node psql1 standby") to test failover
- After a while psql2 became the PRI, which is very good
- When I put psql1 back online, postgres wouldn't start anymore from pacemaker (unknown error).

I tried to start postgres manually from the shell it worked fine, even the monitor was able to see that it became in SYNC (obviously the master/slave group was showing improper state as psql was started outside pacemaker.

I don't think data inconsistency is the case, partially because there are no clients connected, partially because psql starts properly outside pacemaker.

Here is what is relevant from the log:

Nov 27 16:02:50 psql1 pgsql[11021]: DEBUG: PostgreSQL is running as a primary.
Nov 27 16:02:51 psql1 pgsql[11021]: DEBUG: node=psql2, state=STREAMING, sync_state=SYNC
Nov 27 16:02:53 psql1 pgsql[11142]: DEBUG: PostgreSQL is running as a primary.
Nov 27 16:02:53 psql1 pgsql[11142]: DEBUG: node=psql2, state=STREAMING, sync_state=SYNC
Nov 27 16:02:55 psql1 pgsql[11272]: DEBUG: PostgreSQL is running as a primary.
Nov 27 16:02:55 psql1 pgsql[11272]: DEBUG: node=psql2, state=STREAMING, sync_state=SYNC
Nov 27 16:02:57 psql1 pgsql[11368]: DEBUG: PostgreSQL is running as a primary.
Nov 27 16:02:57 psql1 pgsql[11368]: DEBUG: node=psql2, state=STREAMING, sync_state=SYNC
Nov 27 16:03:00 psql1 pgsql[11463]: DEBUG: PostgreSQL is running as a primary.
Nov 27 16:03:00 psql1 pgsql[11463]: DEBUG: node=psql2, state=STREAMING, sync_state=SYNC
Nov 27 16:03:00 psql1 pgsql[11556]: DEBUG: notify: pre for demote
Nov 27 16:03:00 psql1 pgsql[11590]: INFO: Stopping PostgreSQL on demote.
Nov 27 16:03:02 psql1 pgsql[11590]: INFO: waiting for server to shut down..... done server stopped
Nov 27 16:03:02 psql1 pgsql[11590]: INFO: Removing /var/lib/pgsql/PGSQL.lock.
Nov 27 16:03:02 psql1 pgsql[11590]: INFO: PostgreSQL is down
Nov 27 16:03:02 psql1 pgsql[11590]: INFO: Changing pgsql-status on psql1 : PRI->STOP.
Nov 27 16:03:02 psql1 pgsql[11590]: DEBUG: Created recovery.conf. host=10.12.1.28, user=postgres
Nov 27 16:03:02 psql1 pgsql[11590]: INFO: Setup all nodes as an async.
Nov 27 16:03:02 psql1 pgsql[11732]: DEBUG: notify: post for demote
Nov 27 16:03:02 psql1 pgsql[11732]: DEBUG: post-demote called. Demote uname is psql1
Nov 27 16:03:02 psql1 pgsql[11732]: INFO: My Timeline ID and Checkpoint : 14:0000000023000020
Nov 27 16:03:02 psql1 pgsql[11732]: WARNING: Can't get psql2 master baseline. Waiting...
Nov 27 16:03:03 psql1 pgsql[11732]: INFO: psql2 master baseline : 14:0000000023000000
Nov 27 16:03:03 psql1 pgsql[11732]: ERROR: My data is inconsistent.
Nov 27 16:03:03 psql1 pgsql[11867]: DEBUG: notify: pre for stop
Nov 27 16:03:03 psql1 pgsql[11969]: INFO: PostgreSQL is already stopped.
Nov 27 16:03:12 psql1 pgsql[12053]: INFO: Don't check /var/lib/postgresql/9.1/main during probe
Nov 27 16:03:12 psql1 pgsql[12053]: INFO: PostgreSQL is down
Nov 27 16:03:27 psql1 pgsql[12204]: INFO: Changing pgsql-status on psql1 : ->STOP.
Nov 27 16:03:27 psql1 pgsql[12204]: DEBUG: Created recovery.conf. host=10.12.1.28, user=postgres
Nov 27 16:03:27 psql1 pgsql[12204]: INFO: Setup all nodes as an async.
Nov 27 16:03:27 psql1 pgsql[12204]: INFO: My Timeline ID and Checkpoint : 14:0000000023000020
Nov 27 16:03:27 psql1 pgsql[12204]: INFO: psql2 master baseline : 14:0000000023000000
Nov 27 16:03:27 psql1 pgsql[12204]: ERROR: My data is inconsistent.
Nov 27 16:03:27 psql1 pgsql[12339]: DEBUG: notify: post for start
Nov 27 16:03:27 psql1 pgsql[12373]: DEBUG: notify: pre for stop
Nov 27 16:03:27 psql1 pgsql[12407]: INFO: PostgreSQL is already stopped.

Thanks,

Attila

-----Original Message-----
From: Takatoshi MATSUO [mailto:matsuo.tak at gmail.com]
Sent: 2011. november 27. 11:07
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed

Hi Attila

2011/11/27 Attila Megyeri <amegyeri at minerva-soft.com>:
> Hi Takatoshi,
>
> You were right, changing the shell to bash resolved the problem.
> The cluster now started in sync mode - thank you very much.

You're very welcome.

> I will be testing it in the next couple of days. I did just a very quick test - it seems that psql
> master failed over to psql2 properly, but when I tried to move it back to psql1 there was some
> problems starting psql on node 1.

If master(psql1) is failed, its data may be inconsistency.
A PostgreSQL developer says that it's a feature.
Therefore my RA prevent it from starting automatically if data is inconsistency.
Please backup psql2' data and restore it to psql1, and remove
/var/lib/pgsql/PGSQL.lock file
before clearing failcount.

I use rsync to backup and restore in the following way.
-----
# psql -h 192.168.2.114 -U postgres -c "SELECT pg_start_backup('label')"
# rsync -avr --delete --exclude=postmaster.pid
192.168.2.114:/var/lib/pgsql/9.1/data/ /var/lib/pgsql/9.1/data/
# psql -h 192.168.2.114 -U postgres -c "SELECT pg_stop_backup()"
-----

BTW I fixed some bugs 2 days ago.
Please use the newest version.

Thanks,
Takatoshi MATSUO

>
> Does it work fine for you in  both directions?
>
> Thank you very much.
>
> Have a nice weekend,
>
> Attila
>
>
>
> -----Original Message-----
> From: Takatoshi MATSUO [mailto:matsuo.tak at gmail.com]
> Sent: 2011. november 27. 6:12
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed
>
> Hi Attila
>
> 2011/11/27 Attila Megyeri <amegyeri at minerva-soft.com>:
>> Hi Takatoshi,
>>
>> Thank you for coming back to me so quickly.
>>
>> In the /var/lib/pgsql there are the following files:
>>
>> PSQL1:
>> =====
>> root at psql1:/var/lib/pgsql# ls -la
>> total 16
>> drwxr-xr-x  2 postgres postgres 4096 Nov 26 18:04 .
>> drwxr-xr-x 35 root     root     4096 Nov 25 22:21 ..
>> -rw-r--r--  1 postgres postgres    1 Nov 26 00:17 rep_mode.conf
>> -rw-r--r--  1 root     root       49 Nov 26 18:04 xlog_note.0
>>
>> root at psql1:/var/lib/pgsql# cat xlog_note.0 -e psql1 0000000019000000
>> psql2 0000000019000000
>> root at psql1:/var/lib/pgsql#
>>
>> PSQL2:
>> =======
>> root at psql2:/var/lib/pgsql# ls -la
>> total 16
>> drwxr-xr-x  2 postgres postgres 4096 Nov 26 18:05 .
>> drwxr-xr-x 33 root     root     4096 Nov 26 00:10 ..
>> -rw-r--r--  1 postgres postgres    1 Nov 26 00:24 rep_mode.conf
>> -rw-r--r--  1 root     root       49 Nov 26 18:05 xlog_note.0
>> root at psql2:/var/lib/pgsql# cat xlog_note.0 -e psql1 0000000019000000
>> psql2 0000000019000000
>> root at psql2:/var/lib/pgsql#
>
> It seems that dash's bultin echo command is used because echo with "-e" option dose not function.
>
> Perhaps my RA also depends on bash.
> Can you use a bash instead of a dash?
>
>> BTW, postgres is installed under /var/lib/postgresql , but I noticed that some parts of the RA are referring to the  /var/lib/pgsql directory, so I created that directory and i keep some of the files there.
>
> It's no ploblem.
> If you want to change this path, please specify it using "tmpdir" parameter.
>
> Regards,
> Takatoshi MATSUO
>
>>
>> Thanks,
>> Attila
>>
>>
>>
>> -----Original Message-----
>> From: Takatoshi MATSUO [mailto:matsuo.tak at gmail.com]
>> Sent: 2011. november 26. 18:27
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
>> RA needed
>>
>> Hi Attila
>>
>> 1. Are there /var/lib/pgsql/xlog_note.0 , xlog_note.1, xlog_note.2 .... files?
>>   These files are created while checking a xlog location on monitor.
>>
>> 2. Do these files include lines as below?
>> ---------
>> pgsql1  0000000019000000
>> pgsql2  0000000019000000
>> ---------
>>
>> Regards.
>> Takatoshi MATSUO
>>
>>
>> 2011年11月26日22:44 Attila Megyeri <amegyeri at minerva-soft.com>:
>>> Hi Yoshiharu, Takatoshi,
>>>
>>> Spent another day, without success. :(
>>>
>>> I started from scratch and synchronous replications works nicely when nodes are started outside pacemaker.
>>> My PostgreSQL version is 9.1.1.
>>>
>>> When I start from pacemaker, after a while it gets into the following state:
>>>
>>> Online: [ psql1 psql2 ]
>>>
>>>  Master/Slave Set: msPostgresql [postgresql]
>>>     Slaves: [ psql1 psql2 ]
>>>  Clone Set: clnPingCheck [pingCheck]
>>>     Started: [ psql1 psql2 ]
>>>
>>> Node Attributes:
>>> * Node psql1:
>>>    + default_ping_set                  : 100
>>>    + master-postgresql:0               : -INFINITY
>>>    + pgsql-status                      : HS:alone
>>>    + pgsql-xlog-loc                    : 0000000019000000
>>> * Node psql2:
>>>    + default_ping_set                  : 100
>>>    + master-postgresql:1               : -INFINITY
>>>    + pgsql-status                      : HS:alone
>>>    + pgsql-xlog-loc                    : 0000000019000000
>>>
>>>
>>> The psql status queries return the following:
>>>
>>> PSQL1
>>> ======
>>> postgres at psql1:/root$ psql  -c "select application_name,upper(state),upper(sync_state) from pg_stat_replication"
>>> application_name | upper | upper
>>> ------------------+-------+-------
>>> (0 rows)
>>>
>>> postgres at psql1:/root$ psql  -Atc "select pg_last_xlog_replay_location(),pg_last_xlog_receive_location()"
>>> 0/19000020|0/19000000
>>>
>>> PSQL2
>>> ======
>>> postgres at psql2:~$  psql  -c "select application_name,upper(state),upper(sync_state) from pg_stat_replication"
>>>  application_name | upper | upper
>>> ------------------+-------+-------
>>> (0 rows)
>>>
>>> postgres at psql2:~$ psql  -Atc "select pg_last_xlog_replay_location(),pg_last_xlog_receive_location()"
>>> 0/19000000|0/19000000
>>>
>>>
>>> Neither server can connect (obviously) to the master, as the vip_repl Is not brought up.
>>>
>>>
>>> Could you help me understand WHAT is the action/state/event that sould promote one of the nodes? I see that pacemaker monitors the servers every X seconds, but nothing else happens.
>>>
>>> In the log (limited to pgsql) the following sequence is repeated
>>> forewer
>>>
>>> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: Master is not exist.
>>> Nov 26 13:36:19 psql1 pgsql[19829]: DEBUG: Checking right of master.
>>> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: My data status=.
>>> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql1 xlog location :
>>> 0000000019000000 Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql2 xlog
>>> location : 0000000019000000 Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: PostgreSQL is running as a hot standby.
>>> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: Master is not exist.
>>> Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: Checking right of master.
>>> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: My data status=.
>>> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql1 xlog location :
>>> 0000000019000000 Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql2 xlog
>>> location : 0000000019000000 Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: PostgreSQL is running as a hot standby.
>>> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: Master is not exist.
>>> Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: Checking right of master.
>>> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: My data status=.
>>> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql1 xlog location :
>>> 0000000019000000 Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql2 xlog
>>> location : 0000000019000000 Nov 26 13:36:41 psql1 pgsql[20343]: DEBUG: PostgreSQL is running as a hot standby.
>>>
>>>
>>> Any help is appreciated!
>>>
>>> Regards,
>>> Attila
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Yoshiharu Mori [mailto:y-mori at sraoss.co.jp]
>>> Sent: 2011. november 25. 14:17
>>> To: The Pacemaker cluster resource manager
>>> Cc: Attila Megyeri
>>> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
>>> RA needed
>>>
>>> Hi　Attila
>>>
>>>> A quick snippet from the corosync.log
>>>>
>>>> Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master.
>>>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=.
>>>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location :
>>>> 000000000D000000 Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog
>>>> location : 0000000008000000
>>>>
>>>> As you see, the "my data status" returns an empty string.
>>>
>>> My log is same. but it works.
>>>
>>> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Master is not exist.
>>> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Checking right of master.
>>> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: My data status=.
>>> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm01 xlog location :
>>> 0000000005000020 Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm02
>>> xlog location : 0000000005000000
>>>
>>> In my log, the following logs are outputted and started after checking xlog location(3 times).
>>>
>>> Nov 18 19:29:39 osspc24-1 pgsql[18720]: INFO: I have a master right.
>>>
>>> Please show us more corosync.log.
>>>
>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
>>>> Sent: 2011. november 25. 9:28
>>>> To: The Pacemaker cluster resource manager
>>>> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
>>>> RA needed
>>>>
>>>> Hi Takatoshi,
>>>>
>>>> I have restored the PSQL to run without corosync so I cannot send you the crm_mon output now.
>>>>
>>>> What I can tell for sure:
>>>> - RA never promoted any of the nodes, no matter what the status was. It also did not promote the node, when it was the only one.
>>>> - I believe the issue is in the comparison of the xlogs. How could I troubleshoot that? I see from the logs that crm NEVER tried to invoke pgsql with "promote"
>>>> - I tried previously the crm_mon -A option, but there was never a "
>>>> pgsql-data-status" attribute. The other attribs were there,
>>>> including the HS:alone
>>>> - In the corosync log the only relevant RA message I see is " Master is not exist. " I never saw a message like  "My data is out-of-date"
>>>>
>>>> Thank you!
>>>>
>>>> Attila
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Takatoshi MATSUO [mailto:matsuo.tak at gmail.com]
>>>> Sent: 2011. november 25. 8:56
>>>> To: The Pacemaker cluster resource manager
>>>> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
>>>> RA needed
>>>>
>>>> Hi Attila
>>>>
>>>> 2011/11/24 Attila Megyeri <amegyeri at minerva-soft.com>:
>>>> > Hi Takatoshi, All,
>>>> >
>>>> > Thanks for your reply.
>>>> > I see that you have invested significant effort in the development of the RA. I spent the last day trying to set up the RA, but without much success.
>>>> >
>>>> > My infrastructure is very similar to yours, except for the fact that currently I am testing with a single network adapter.
>>>> >
>>>> > Replication works nicely when I start the databases manually, not using corosync.
>>>> >
>>>> > When I try to start using corosync,I see that the ping resources start normally, but the msPostgresql starts on both nodes in slave mode, and I see "HS:alone"
>>>>
>>>> To see "HS:alone" is normal.
>>>> And RA compares xlog locations and promote the postgresql having new data.
>>>>
>>>> > In the Wiki you state, the if I start on a signle node only, PSQL should start in Master mode (PRI), but this is not the case.
>>>>
>>>> If the data is old, the node can't be master.
>>>> To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC".
>>>> Plese check it using "crm_mon -A".
>>>>
>>>>
>>>>
>>>>
>>>> And to become a master from stopped takes a few minutes because the RA compares xlog location on monitor.
>>>>
>>>>
>>>> > The recovery.conf file is created immediately, and from the logs I see no attempt at all to promote the node.
>>>> > In the postgres logs I see that node1, which is supposed to be a master, tries to connect to the vip-rep IP address, which is NOT brought up, because it depends on the Master role...
>>>> >
>>>> > Do you have any idea?
>>>>
>>>> Please check HA log.
>>>> My RA outputs "My data is out-of-date. status=********" to log if the data is old.
>>>>
>>>> Regards,
>>>> Takatoshi MATSUO
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> --
>>> Yoshiharu Mori <y-mori at sraoss.co.jp>
>>> SRA OSS, Inc Japan http://www.sraoss.co.jp
>>> TEL: 03-5979-2701
>>> FAX: 03-5979-2702
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org