[ClusterLabs] PAF / pgSQL fails after OS/system shutdown

Tue Nov 7 11:57:11 EST 2023

hi guys

Having 3-node pgSQL cluster with PAF - when all three 
systems are shutdown at virtually the same time then PAF 
fails to start when HA cluster is operational again.

from status:
...
Migration Summary:
   * Node: ubusrv2 (2):
     * PGSQL-PAF-5433: migration-threshold=1000000 
fail-count=1000000 last-failure='Tue Nov  7 17:52:38 2023'
   * Node: ubusrv3 (3):
     * PGSQL-PAF-5433: migration-threshold=1000000 
fail-count=1000000 last-failure='Tue Nov  7 17:52:38 2023'
   * Node: ubusrv1 (1):
     * PGSQL-PAF-5433: migration-threshold=1000000 
fail-count=1000000 last-failure='Tue Nov  7 17:52:38 2023'

Failed Resource Actions:
   * PGSQL-PAF-5433_stop_0 on ubusrv2 'error' (1): call=90, 
status='complete', exitreason='Unexpected state for instance 
"PGSQL-PAF-5433" (returned 1)', last-rc-change='Tue Nov  7 
17:52:38 2023', queued=0ms, exec=84ms
   * PGSQL-PAF-5433_stop_0 on ubusrv3 'error' (1): call=82, 
status='complete', exitreason='Unexpected state for instance 
"PGSQL-PAF-5433" (returned 1)', last-rc-change='Tue Nov  7 
17:52:38 2023', queued=0ms, exec=82ms
   * PGSQL-PAF-5433_stop_0 on ubusrv1 'error' (1): call=86, 
status='complete', exitreason='Unexpected state for instance 
"PGSQL-PAF-5433" (returned 1)', last-rc-change='Tue Nov  7 
17:52:38 2023', queued=0ms, exec=108ms

and all three pgSQLs show virtually identical logs:
...
2023-11-07 16:54:45.532 UTC [24936] LOG:  starting 
PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on 
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-11-07 16:54:45.532 UTC [24936] LOG:  listening on IPv4 
address "0.0.0.0", port 5433
2023-11-07 16:54:45.532 UTC [24936] LOG:  listening on IPv6 
address "::", port 5433
2023-11-07 16:54:45.535 UTC [24936] LOG:  listening on Unix 
socket "/var/run/postgresql/.s.PGSQL.5433"
2023-11-07 16:54:45.547 UTC [24938] LOG:  database system 
was interrupted while in recovery at log time 2023-11-07 
15:30:56 UTC
2023-11-07 16:54:45.547 UTC [24938] HINT:  If this has 
occurred more than once some data might be corrupted and you 
might need to choose an earlier recovery target.
2023-11-07 16:54:45.819 UTC [24938] LOG:  entering standby mode
2023-11-07 16:54:45.824 UTC [24938] FATAL:  could not open 
directory "/var/run/postgresql/14-paf.pg_stat_tmp": No such 
file or directory
2023-11-07 16:54:45.825 UTC [24936] LOG:  startup process 
(PID 24938) exited with exit code 1
2023-11-07 16:54:45.825 UTC [24936] LOG:  aborting startup 
due to startup process failure
2023-11-07 16:54:45.826 UTC [24936] LOG:  database system is 
shut down

Is this "test" case's result, as I showed above, expected? 
It reproduces every time.
If not - what might it be I'm missing?

many thanks, L.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20231107/541c8992/attachment.htm>