[ClusterLabs] PAF / pgSQL fails after OS/system shutdown - FIX
lejeczek
peljasz at yahoo.co.uk
Fri Nov 10 06:27:24 EST 2023
On 07/11/2023 17:57, lejeczek via Users wrote:
> hi guys
>
> Having 3-node pgSQL cluster with PAF - when all three
> systems are shutdown at virtually the same time then PAF
> fails to start when HA cluster is operational again.
>
> from status:
> ...
> Migration Summary:
> * Node: ubusrv2 (2):
> * PGSQL-PAF-5433: migration-threshold=1000000
> fail-count=1000000 last-failure='Tue Nov 7 17:52:38 2023'
> * Node: ubusrv3 (3):
> * PGSQL-PAF-5433: migration-threshold=1000000
> fail-count=1000000 last-failure='Tue Nov 7 17:52:38 2023'
> * Node: ubusrv1 (1):
> * PGSQL-PAF-5433: migration-threshold=1000000
> fail-count=1000000 last-failure='Tue Nov 7 17:52:38 2023'
>
> Failed Resource Actions:
> * PGSQL-PAF-5433_stop_0 on ubusrv2 'error' (1): call=90,
> status='complete', exitreason='Unexpected state for
> instance "PGSQL-PAF-5433" (returned 1)',
> last-rc-change='Tue Nov 7 17:52:38 2023', queued=0ms,
> exec=84ms
> * PGSQL-PAF-5433_stop_0 on ubusrv3 'error' (1): call=82,
> status='complete', exitreason='Unexpected state for
> instance "PGSQL-PAF-5433" (returned 1)',
> last-rc-change='Tue Nov 7 17:52:38 2023', queued=0ms,
> exec=82ms
> * PGSQL-PAF-5433_stop_0 on ubusrv1 'error' (1): call=86,
> status='complete', exitreason='Unexpected state for
> instance "PGSQL-PAF-5433" (returned 1)',
> last-rc-change='Tue Nov 7 17:52:38 2023', queued=0ms,
> exec=108ms
>
> and all three pgSQLs show virtually identical logs:
> ...
> 2023-11-07 16:54:45.532 UTC [24936] LOG: starting
> PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on
> x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
> 2023-11-07 16:54:45.532 UTC [24936] LOG: listening on
> IPv4 address "0.0.0.0", port 5433
> 2023-11-07 16:54:45.532 UTC [24936] LOG: listening on
> IPv6 address "::", port 5433
> 2023-11-07 16:54:45.535 UTC [24936] LOG: listening on
> Unix socket "/var/run/postgresql/.s.PGSQL.5433"
> 2023-11-07 16:54:45.547 UTC [24938] LOG: database system
> was interrupted while in recovery at log time 2023-11-07
> 15:30:56 UTC
> 2023-11-07 16:54:45.547 UTC [24938] HINT: If this has
> occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> 2023-11-07 16:54:45.819 UTC [24938] LOG: entering standby
> mode
> 2023-11-07 16:54:45.824 UTC [24938] FATAL: could not open
> directory "/var/run/postgresql/14-paf.pg_stat_tmp": No
> such file or directory
> 2023-11-07 16:54:45.825 UTC [24936] LOG: startup process
> (PID 24938) exited with exit code 1
> 2023-11-07 16:54:45.825 UTC [24936] LOG: aborting startup
> due to startup process failure
> 2023-11-07 16:54:45.826 UTC [24936] LOG: database system
> is shut down
>
> Is this "test" case's result, as I showed above, expected?
> It reproduces every time.
> If not - what might it be I'm missing?
>
> many thanks, L.
>
> _______________________________________________
>
to share my "fix" for it - perhaps it was introduced by
OS/packages (Ubuntu 22) updates - ? - as oppose to resource
agent itself.
As the logs point out - pg_stat_tmp - is missing and from
what I see it's only the master, within a cluster, doing
those stats.
That appeared, I use the word for I did not put it into
configs, on all nodes.
fix = to not use _pg_stat_tmp_ directive/option at all.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20231110/4989f311/attachment-0001.htm>
More information about the Users
mailing list