[ClusterLabs] Antw: [EXT] mayhem when exiting maintenance mode

Tue Feb 4 10:04:23 EST 2020

>>> Derek Viljoen <derekv at infinite.io> schrieb am 04.02.2020 um 15:39 in Nachricht
<14320_1580828028_5E39857C_14320_654_1_CAJVskpU2M4UgBmwUv17tUJ5mWcb-4G=wi8R1pWGM
+67YFCHzw at mail.gmail.com>:
> We have a three-node postgres cluster running on Ubuntu 14.04, currently at
> Postgres 9.5 with Corosync 2.4.2 and Pacemaker 1.1.18.
> 
> I'm trying to automate upgrading the database to 11.4.  (Our product is a
> network appliance, so it needs to be automated for our customers)
> 
> I first put the cluster into maintenance mode, perform the upgrade, update
> the resource paths in the crm config to point to the new db instance,
> restore the db from the old version (required by postgres to do major
> version upgrades).  At the end of all these steps everything looks good.
> 
> But when I turn off maintenance mode all of my db nodes suddenly go down
> and all three appear to be in slave mode, with no master.  If I wait a few
> minutes it appears that node 2 takes over as master, but it has an empty
> database, because apparently it wasn't able to replicate the restored db
> from the original master yet.  Can anyone tell me what is causing this?

Hi!

It's hard to tell without seeing the logs, but basically at the end of maintenance mode the processes should be in the same state as when starting maintenance mode.
Dod you consider to put 2 of 3 nodes into standby and then start maintenance mode. So there should be resources just on one node. Then update all nodes, deactivate maintenance mode and then online the other nodes. I haven't treid that myself, but maybe it's worth a try.

Regards,
Ulrich

> 
> Derek Viljoen
> derekv at infinite.io