[ClusterLabs] Trouble starting up PAF cluster for first time

Sat Apr 7 13:08:20 UTC 2018

On Fri, 06 Apr 2018 16:46:08 -0600
Casey & Gina <caseyandgina at icloud.com> wrote:

> It looks like the main problem was that I needed to add
> pghost="/var/run/postgresql" to the postgresql-10-main resource.

Yes, that was in the quick start documentation as well:
https://clusterlabs.github.io/PAF/Quick_Start-Debian-9-crm.html

Fell free to suggest corrections or some improvement to this doc! Just keep in
mind this has to be a "quick start", focusing on PAF, not PostgreSQL setup :)

> I'm not sure why I have to do that, but it makes things work.

The PAF resource agent need to connect to your local PostgreSQL instance to
check its status in various situations. Parameters "pgport" and "pghost" are by
default "5432" and "/tmp" (same defaults than PostgreSQL policy). The "/tmp"
value is the directory where PostgreSQL creates its unix socket on startup
where local clients can connect through. The unix socket will be
eg. "/tmp/.s.PGSQL.5432".

However, the Debian policy is overwrite the "pghost" default value with
"/var/run/postgresql", not "/tmp". 

Note that you could set "localhost" or any other local IP address for pghost, as
far as PAF can connect with password-less authentication. Unix socket makes
more sense for local connections though.

> For both this and my last E-mail to the list that was also a problem with the
> command being run to start the instance up, I'd like to understand how to
> diagnose what's happening better myself instead of resorting to guesswork.

I suppose this should be some documentation improvement. Or maybe we could
change the PAF behavior during first startup to make it clearer.

> How can I tell exactly what the command is that Pacemaker ends up calling to
> start PostgreSQL?  I don't see it in corosync.log.  If I could see exactly
> what was being tried, I could try running it by hand and determine the
> problem myself a lot more effectively.

See "sub _pg_ctl_start" in source code as pointed by Steven in a previous email.

However, your problem doesn't comes from the start operation here. Right after
the start occurs, PAF connects to PostgreSQL to check if it is started as
expected and report the real status to Pacemaker. Because it couldn't connect to
your instance using the wrong pghost, PAF was reporting an error to Pacemaker.