[ClusterLabs] Pacemaker Shutdown
Harvey.Shepherd at Aviatnet.com
Wed Jul 22 17:43:26 EDT 2020
Thanks for your response Reid. What you say makes sense, and under normal circumstances if a resource failed, I'd want all of its dependents to be stopped cleanly before restarting the failed resource. However if pacemaker is shutting down on a node (e.g. due to a restart request), then I just want to failover as fast as possible, so an unclean kill is fine. At the moment the shutdown process is taking 2 mins. I was just wondering if there was a way to do this.
From: Users <users-bounces at clusterlabs.org> on behalf of Reid Wahl <nwahl at redhat.com>
Sent: 23 July 2020 08:05
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown
On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd <Harvey.Shepherd at aviatnet.com<mailto:Harvey.Shepherd at aviatnet.com>> wrote:
I'm running Pacemaker 2.0.3 on a two-node cluster, controlling 40+ resources which are a mixture of clones and other resources that are colocated with the master instance of certain clones. I've noticed that if I terminate pacemaker on the node that is hosting the master instances of the clones, Pacemaker focuses on stopping resources on that node BEFORE failing over to the other node, leading to a longer outage than necessary. Is there a way to change this behaviour?
As you likely know, a given resource active/passive resource will have to stop on one node before it can start on another node, and the same goes for a promoted clone instance having to demote on one node before it can promote on another. There are exceptions for clone instances and for promotable clones with promoted-max > 1 ("allow more than one master instance"). A resource that's configured to run on one node at a time should not try to run on two nodes during failover.
With that in mind, what exactly are you wanting to happen? Is the problem that all resources are stopping on node 1 before any of them start on node 2? Or that you want Pacemaker shutdown to kill the processes on node 1 instead of cleanly shutting them down? Or something different?
These are the actions and logs I saw during the test:
Ack. This seems like it's just telling us that Pacemaker is going through a graceful shutdown. The info more relevant to the resource stop/start order would be in /var/log/pacemaker/pacemaker.log (or less detailed in /var/log/messages) on the DC.
# /etc/init.d/pacemaker stop
Signaling Pacemaker Cluster Manager to terminate
Waiting for cluster services to unload..............................................................sending signal 9 to procs
2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling Pacemaker Cluster Manager to terminate
2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting for cluster services to unload
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: new_event_notification (6140-6141-9): Broken pipe (32)
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: Notification of client stonithd/665bde82-cb28-40f7-9132-8321dc2f1992 failed
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: new_event_notification (6140-6143-8): Broken pipe (32)
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: Notification of client attrd/a26ca273-3422-4ebe-8cb7-95849b8ff130 failed
2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740 pacemaker-schedulerd.6240 warning: Blind faith: not fencing unseen nodes
2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is inactive (3).
Manage your subscription:
ClusterLabs home: https://www.clusterlabs.org/
Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users