[ClusterLabs] Fwd: Postgres pacemaker cluster failure

Wed Jul 10 05:42:20 EDT 2019

On Tue, 9 Jul 2019 19:57:06 +0300
Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> 09.07.2019 13:08, Danka Ivanović пишет:
> > Hi I didn't manage to start master with postgres, even if I increased start
> > timeout. I checked executable paths and start options.

We would require much more logs from this failure...

> > When cluster is running with manually started master and slave started over
> > pacemaker, everything works ok.

Logs from this scenario might be interesting as well to check and compare.

> > Today we had failover again.
> > I cannot find reason from the logs, can you help me with debugging? Thanks.

logs logs logs please.

> > Jul 09 09:16:32 [2679] postgres1       lrmd:    debug:
> > child_kill_helper:	Kill pid 12735's group Jul 09 09:16:34 [2679]
> > postgres1       lrmd:  warning: child_timeout_callback:
> > PGSQL_monitor_15000 process (PID 12735) timed out  
> 
> You probably want to enable debug output in resource agent. As far as I
> can tell, this requires HA_debug=1 in environment of resource agent, but
> for the life of me I cannot find where it is possible to set it.
> 
> Probably setting it directly in resource agent for debugging is the most
> simple way.

I usually set this in "/etc/sysconfig/pacemaker". Never tried to add it
to pgsqlms, interesting.

> P.S. crm_resource is called by resource agent (pgsqlms). And it shows
> result of original resource probing which makes it confusing. At least
> it explains where these logs entries come from.

Not sure tu understand what you mean :/