[ClusterLabs] Pacemaker Shutdown

Reid Wahl nwahl at redhat.com
Wed Jul 22 18:10:27 EDT 2020


Thanks for the clarification. As far as I'm aware, there's no way to do
this at the Pacemaker level during a Pacemaker shutdown. It would require
uncleanly killing all resources, which doesn't make sense at the Pacemaker
level.

Pacemaker only knows how to stop a resource by running the resource agent's
stop operation. Even if Pacemaker wanted to kill a resource uncleanly for
speed, the way to do so for each resource would depend on the type of
resource. For example, an IPaddr2 resource doesn't represent a running
process that can be killed; `ip addr del` would be necessary.

If we went the route of killing the Pacemaker daemon entirely, rather than
relying on it to stop resources, then that wouldn't guarantee the node has
stopped using the actual resources before the failover node tries to take
over. For example, for a Filesystem, the FS could still be mounted after
Pacemaker is killed.

The only ways to know with certainty that node 1 has stopped using cluster
resources so that node 2 can safely take them over are:

   1. gracefully stop them, or
   2. fence/reboot node 1

With that being said, if you don't mind node 1 being fenced to initiate a
faster failover, then you could fence it from node 2.

Others on the list may think of something I haven't considered here.

On Wed, Jul 22, 2020 at 2:43 PM Harvey Shepherd <
Harvey.Shepherd at aviatnet.com> wrote:

> Thanks for your response Reid. What you say makes sense, and under normal
> circumstances if a resource failed, I'd want all of its dependents to be
> stopped cleanly before restarting the failed resource. However if pacemaker
> is shutting down on a node (e.g. due to a restart request), then I just
> want to failover as fast as possible, so an unclean kill is fine. At the
> moment the shutdown process is taking 2 mins. I was just wondering if there
> was a way to do this.
>
> Regards,
> Harvey
>
> ------------------------------
> *From:* Users <users-bounces at clusterlabs.org> on behalf of Reid Wahl <
> nwahl at redhat.com>
> *Sent:* 23 July 2020 08:05
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> *Subject:* EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown
>
>
> On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd <
> Harvey.Shepherd at aviatnet.com> wrote:
>
> Hi All,
>
> I'm running Pacemaker 2.0.3 on a two-node cluster, controlling 40+
> resources which are a mixture of clones and other resources that are
> colocated with the master instance of certain clones. I've noticed that if
> I terminate pacemaker on the node that is hosting the master instances of
> the clones, Pacemaker focuses on stopping resources on that node BEFORE
> failing over to the other node, leading to a longer outage than necessary.
> Is there a way to change this behaviour?
>
>
> Hi, Harvey.
>
> As you likely know, a given resource active/passive resource will have to
> stop on one node before it can start on another node, and the same goes for
> a promoted clone instance having to demote on one node before it can
> promote on another. There are exceptions for clone instances and for
> promotable clones with promoted-max > 1 ("allow more than one master
> instance"). A resource that's configured to run on one node at a time
> should not try to run on two nodes during failover.
>
> With that in mind, what exactly are you wanting to happen? Is the problem
> that all resources are stopping on node 1 before *any* of them start on
> node 2? Or that you want Pacemaker shutdown to kill the processes on node 1
> instead of cleanly shutting them down? Or something different?
>
> These are the actions and logs I saw during the test:
>
>
> Ack. This seems like it's just telling us that Pacemaker is going through
> a graceful shutdown. The info more relevant to the resource stop/start
> order would be in /var/log/pacemaker/pacemaker.log (or less detailed in
> /var/log/messages) on the DC.
>
> # /etc/init.d/pacemaker stop
> Signaling Pacemaker Cluster Manager to terminate
>
> Waiting for cluster services to
> unload..............................................................sending
> signal 9 to procs
>
>
> 2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker.
> Signaling Pacemaker Cluster Manager to terminate
> 2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting
> for cluster services to unload
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
> pacemaker-based.6140  warning: new_event_notification (6140-6141-9): Broken
> pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
> pacemaker-based.6140  warning: Notification of client
> stonithd/665bde82-cb28-40f7-9132-8321dc2f1992 failed
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
> pacemaker-based.6140  warning: new_event_notification (6140-6143-8): Broken
> pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
> pacemaker-based.6140  warning: Notification of client
> attrd/a26ca273-3422-4ebe-8cb7-95849b8ff130 failed
> 2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740
> pacemaker-schedulerd.6240  warning: Blind faith: not fencing unseen nodes
> 2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker
> is inactive (3).
>
> Regards,
> Harvey
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
> Reid Wahl, RHCA
> Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200722/3ef7bc7c/attachment.htm>


More information about the Users mailing list