[ClusterLabs] Autostart/Enabling of Pacemaker and corosync

Tue Apr 27 12:24:16 EDT 2021

On Mon, 26 Apr 2021 18:04:41 +0000 (UTC)
Strahil Nikolov <hunter86_bg at yahoo.com> wrote:

> I prefer that the stack is auto enabled. Imagine that you got a DB that is
> replicated and primary DB node is fenced. You would like that node to join
> the cluster and if possible to sync with the new primary instead of staying
> down.

In the case of PostgreSQL, the failing primary may not be able to failback
automatically with the new primary. Worse, if it actually enters in
replication, it might just silently become a corrupted standby, giving a wrong
feeling of safety, until a new failover occurs.

PAF doesn't handle auto-failback (eg. pg_rewind) per design, to avoid code
complexity. We don't want to give a wrong feeling of perfect
full-availability/failback/fully-automated-admin'ed PgSQL cluster. If something
went wrong with your DB, you better need to check and fix it. You need both
system and DBA guy on board to take care of the availability and safety of your
cluster.

Note that auto-failback of secondary nodes is safe, as far as they are able
to actually follow up with the production. Maybe we can imaginer some safety
belts in PAF's code to allow Pacemaker auto-start on boot, but refuse
to start a badly shaped PostgreSQL.

> One such example is the SAP HANA DB. Imagine that the current primary
> node looses storage and it failed to commit all transactions to disk. Without
> replication you will endure data loss for the last 1-2 minutes (depends on
> your monitoring interval) unless you got a replication.

PAF is a shared-nothing approach, it requires replication between nodes.