[ClusterLabs] Fwd: Postgres pacemaker cluster failure
Jehan-Guillaume de Rorthais
jgdr at dalibo.com
Wed Jul 10 10:57:18 EDT 2019
On Wed, 10 Jul 2019 16:34:17 +0200
Danka Ivanovic <danka.ivanovic at sbgenomics.com> wrote:
> Hi, Thank you all for responding so quickly. Part of corosync.log file is
> attached. Cluster failure occured in 09:16 AM yesterday.
> Debug mode is turned on in corosync configuration, but I didn't turn it on
> in pacemaker config. I will test that.
There's really nothing interesting in there sadly. It could even be like pgsqlms hadn't been called at all and the action timed out...
> Postgres log is also attached.
Nothing really revelent there as well.
> Several times cluster failed because of ldap time out, even if I tried to
> disable ldap searching for local postgres user,
This is really anoying. IIRC, this was already happening last time. Fix this
first if you didn't yet?
...
> From syslog it looks like postgres systemd process was
> stoped,
Again, systemd shouldn't take part of anything in your cluster irw postgresql.
If Pacemaker manage PostgreSQL, systemd should have nothing to do with it.
If you really need to start/stop it by hands (I really discourage you to
do so), do it using pg_ctl. And make sure to unmanage the Pacemaker resource
before.
> On Tue, 9 Jul 2019 19:57:06 +0300
> > Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> >
> > > 09.07.2019 13:08, Danka Ivanović пишет:
> > > > Hi I didn't manage to start master with postgres, even if I increased
> > start
> > > > timeout. I checked executable paths and start options.
> >
> > We would require much more logs from this failure...
> >
> > > > When cluster is running with manually started master and slave started
> > over
> > > > pacemaker, everything works ok.
> >
> > Logs from this scenario might be interesting as well to check and compare.
> >
> > > > Today we had failover again.
> > > > I cannot find reason from the logs, can you help me with debugging?
> > Thanks.
> >
> > logs logs logs please.
> >
> > > > Jul 09 09:16:32 [2679] postgres1 lrmd: debug:
> > > > child_kill_helper: Kill pid 12735's group Jul 09 09:16:34 [2679]
> > > > postgres1 lrmd: warning: child_timeout_callback:
> > > > PGSQL_monitor_15000 process (PID 12735) timed out
> > >
> > > You probably want to enable debug output in resource agent. As far as I
> > > can tell, this requires HA_debug=1 in environment of resource agent, but
> > > for the life of me I cannot find where it is possible to set it.
> > >
> > > Probably setting it directly in resource agent for debugging is the most
> > > simple way.
> >
> > I usually set this in "/etc/sysconfig/pacemaker". Never tried to add it
> > to pgsqlms, interesting.
> >
> > > P.S. crm_resource is called by resource agent (pgsqlms). And it shows
> > > result of original resource probing which makes it confusing. At least
> > > it explains where these logs entries come from.
> >
> > Not sure tu understand what you mean :/
> >
--
Jehan-Guillaume de Rorthais
Dalibo
More information about the Users
mailing list