[ClusterLabs] PAF fails to promote slave: Can not get current node LSN location
Tiemen Ruiten
t.ruiten at tech-lab.io
Thu Jul 4 05:38:05 EDT 2019
Hello,
Yesterday, my three node cluster (CentOS 7, PostgreSQL with the PAF
resource agent) went down. For an as of yet unknown reason, the master
(ph-sql-04) did not report to the rest of the cluster and was fenced. (I'll
take the advice given earlier now to setup an rsyslog server...).
Unfortunately, the cluster failed to promote on of the slaves (ph-sql-03)
so that node was fenced too. Then quorum was lost and the stop action for
the pgsqld resource on the last node (ph-sql-05) was executed and although
it timed out (see my earlier post on this list) the PostgreSQL daemon was
eventually stopped, leaving all nodes down.
The error message on ph-sql-03 was:
pgsqlms(pgsqld)[5006]: Jul 03 19:32:38 ERROR: Can not get current node LSN
location
Jul 03 19:32:38 [30148] ph-sql-03.prod.ams.i.rdmedia.com lrmd:
notice: operation_finished: pgsqld_promote_0:5006:stderr [
ocf-exit-reason:Can not get current node LSN location ]
Jul 03 19:32:38 [30148] ph-sql-03.prod.ams.i.rdmedia.com lrmd:
info: log_finished: finished - rsc:pgsqld action:promote call_id:87
pid:5006 exit-code:1 exec-time:237ms queue-time:0ms
Jul 03 19:32:38 [30151] ph-sql-03.prod.ams.i.rdmedia.com crmd:
notice: process_lrm_event: Result of promote operation for pgsqld on
ph-sql-03: 1 (unknown error) | call=87 key=pgsqld_promote_0 confirmed=true
cib-update=8309
Jul 03 19:32:38 [30151] ph-sql-03.prod.ams.i.rdmedia.com crmd:
notice: process_lrm_event: ph-sql-03-pgsqld_promote_0:87 [
ocf-exit-reason:Can not get current node LSN location\n ]
I've seen some PAF Github issues that mention this error, but not sure they
apply to my situation. Is this a bug or is there something wrong with my
setup?
I've attached the corosync logs from the relevant time period
(19:28-19:34).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190704/fc0ecab4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ph-sql-04-corosync.log-20190704
Type: application/octet-stream
Size: 7078 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190704/fc0ecab4/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ph-sql-03-corosync.log-20190704
Type: application/octet-stream
Size: 156103 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190704/fc0ecab4/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ph-sql-05-corosync.log-20190704
Type: application/octet-stream
Size: 152028 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190704/fc0ecab4/attachment-0005.obj>
More information about the Users
mailing list