[ClusterLabs] Pacemaker Shutdown

Harvey Shepherd Harvey.Shepherd at Aviatnet.com
Wed Jul 22 19:44:23 EDT 2020


Fencing could work. Thanks again Reid.

________________________________
From: Users <users-bounces at clusterlabs.org> on behalf of Reid Wahl <nwahl at redhat.com>
Sent: 23 July 2020 10:10
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown

Thanks for the clarification. As far as I'm aware, there's no way to do this at the Pacemaker level during a Pacemaker shutdown. It would require uncleanly killing all resources, which doesn't make sense at the Pacemaker level.

Pacemaker only knows how to stop a resource by running the resource agent's stop operation. Even if Pacemaker wanted to kill a resource uncleanly for speed, the way to do so for each resource would depend on the type of resource. For example, an IPaddr2 resource doesn't represent a running process that can be killed; `ip addr del` would be necessary.

If we went the route of killing the Pacemaker daemon entirely, rather than relying on it to stop resources, then that wouldn't guarantee the node has stopped using the actual resources before the failover node tries to take over. For example, for a Filesystem, the FS could still be mounted after Pacemaker is killed.

The only ways to know with certainty that node 1 has stopped using cluster resources so that node 2 can safely take them over are:

  1.  gracefully stop them, or
  2.  fence/reboot node 1

With that being said, if you don't mind node 1 being fenced to initiate a faster failover, then you could fence it from node 2.

Others on the list may think of something I haven't considered here.

On Wed, Jul 22, 2020 at 2:43 PM Harvey Shepherd <Harvey.Shepherd at aviatnet.com<mailto:Harvey.Shepherd at aviatnet.com>> wrote:
Thanks for your response Reid. What you say makes sense, and under normal circumstances if a resource failed, I'd want all of its dependents to be stopped cleanly before restarting the failed resource. However if pacemaker is shutting down on a node (e.g. due to a restart request), then I just want to failover as fast as possible, so an unclean kill is fine. At the moment the shutdown process is taking 2 mins. I was just wondering if there was a way to do this.

Regards,
Harvey

________________________________
From: Users <users-bounces at clusterlabs.org<mailto:users-bounces at clusterlabs.org>> on behalf of Reid Wahl <nwahl at redhat.com<mailto:nwahl at redhat.com>>
Sent: 23 July 2020 08:05
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>
Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown


On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd <Harvey.Shepherd at aviatnet.com<mailto:Harvey.Shepherd at aviatnet.com>> wrote:
Hi All,

I'm running Pacemaker 2.0.3 on a two-node cluster, controlling 40+ resources which are a mixture of clones and other resources that are colocated with the master instance of certain clones. I've noticed that if I terminate pacemaker on the node that is hosting the master instances of the clones, Pacemaker focuses on stopping resources on that node BEFORE failing over to the other node, leading to a longer outage than necessary. Is there a way to change this behaviour?

Hi, Harvey.

As you likely know, a given resource active/passive resource will have to stop on one node before it can start on another node, and the same goes for a promoted clone instance having to demote on one node before it can promote on another. There are exceptions for clone instances and for promotable clones with promoted-max > 1 ("allow more than one master instance"). A resource that's configured to run on one node at a time should not try to run on two nodes during failover.

With that in mind, what exactly are you wanting to happen? Is the problem that all resources are stopping on node 1 before any of them start on node 2? Or that you want Pacemaker shutdown to kill the processes on node 1 instead of cleanly shutting them down? Or something different?

These are the actions and logs I saw during the test:

Ack. This seems like it's just telling us that Pacemaker is going through a graceful shutdown. The info more relevant to the resource stop/start order would be in /var/log/pacemaker/pacemaker.log (or less detailed in /var/log/messages) on the DC.

# /etc/init.d/pacemaker stop
Signaling Pacemaker Cluster Manager to terminate

Waiting for cluster services to unload..............................................................sending signal 9 to procs


2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling Pacemaker Cluster Manager to terminate
2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting for cluster services to unload
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140  warning: new_event_notification (6140-6141-9): Broken pipe (32)
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140  warning: Notification of client stonithd/665bde82-cb28-40f7-9132-8321dc2f1992 failed
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140  warning: new_event_notification (6140-6143-8): Broken pipe (32)
2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140  warning: Notification of client attrd/a26ca273-3422-4ebe-8cb7-95849b8ff130 failed
2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740 pacemaker-schedulerd.6240  warning: Blind faith: not fencing unseen nodes
2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is inactive (3).

Regards,
Harvey
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


--
Regards,

Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


--
Regards,

Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200722/affed1a6/attachment-0001.htm>


More information about the Users mailing list