[ClusterLabs] Antw: Re: Replicated PGSQL woes

Fri Oct 14 06:10:04 EDT 2016

On Fri, 14 Oct 2016 09:59:04 +0200
"Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:

> >>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 13.10.2016 um
> >>> 23:56 in  
> Nachricht <20161013235606.007018eb at firost>:
> 
> [...]
> > As far as I know, the pgsql resource agent create such a lock file on 
> > promote
> > and delete it on graceful stop. If the PostgreSQL instance couldn't be 
> > stopped
> > correctly, the lock files stays and the RA refuse to start it the next
> > time.  
> 
> As a note: We once had the case of a very old stale PID file, where a valid
> start was denied, because the PID existed, but belonged to a completely
> different process in the meantime (on a busy server). That's why stale PID
> files should be deleted; specifically they shouldn't survive a reboot ;-)

As far as I understand this logic, it has changed now. The PID file of
PostgreSQL contains the PID **and** shmid created by the last postmaster
started.

During a fresh start, if the postmaster.pid file exists, it checks if the PID
AND the shmid still exist on the system and some processes are still connected
to it.

See CreateLockFile in src/backend/utils/init/miscinit.c:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/init/miscinit.c;h=22b046e006e50be49c0615d271ed8963c97192c2;hb=HEAD#l758

> You can conclude from a missing PID that the process is not running with that
> PID, but you cannot conclude from an existing PID that it's still the same
> process ;-)

At least, what I just described checks if the existing PID is owned by the same
user using a kill -0.

Cheers :)