[ClusterLabs] Antw: [EXT] Re: Pacemaker Shutdown
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Jul 23 02:39:13 EDT 2020
>>> Harvey Shepherd <Harvey.Shepherd at Aviatnet.com> schrieb am 22.07.2020 um
23:43
in Nachricht
<CY4PR2201MB1142A9006826EE69A7FBEF458B790 at CY4PR2201MB1142.namprd22.prod.outlook.
om>:
> Thanks for your response Reid. What you say makes sense, and under normal
> circumstances if a resource failed, I'd want all of its dependents to be
> stopped cleanly before restarting the failed resource. However if pacemaker
> is shutting down on a node (e.g. due to a restart request), then I just want
> to failover as fast as possible, so an unclean kill is fine. At the moment
> the shutdown process is taking 2 mins. I was just wondering if there was a
> way to do this.
Hi!
I think you are mixing two concepts: A shutdown request is the attempt to stop
things cleanly all the time, while a node failure (which will be followed by a
fencing opration) definitely will be unable to do a clean shutdown as the node
is considered to be dead already.
Also remember that even STONITH (fencing will take some time), and maybe
generally it's better to try a stop with timeout (which will fence THEN if the
timeout expired).
And of course: HA software is not to make any stop operation faster ;-)
Regards,
Ulrich
>
> Regards,
> Harvey
>
> ________________________________
> From: Users <users‑bounces at clusterlabs.org> on behalf of Reid Wahl
> <nwahl at redhat.com>
> Sent: 23 July 2020 08:05
> To: Cluster Labs ‑ All topics related to open‑source clustering welcomed
> <users at clusterlabs.org>
> Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown
>
>
> On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd
> <Harvey.Shepherd at aviatnet.com<mailto:Harvey.Shepherd at aviatnet.com>> wrote:
> Hi All,
>
> I'm running Pacemaker 2.0.3 on a two‑node cluster, controlling 40+ resources
> which are a mixture of clones and other resources that are colocated with
the
> master instance of certain clones. I've noticed that if I terminate
pacemaker
> on the node that is hosting the master instances of the clones, Pacemaker
> focuses on stopping resources on that node BEFORE failing over to the other
> node, leading to a longer outage than necessary. Is there a way to change
> this behaviour?
>
> Hi, Harvey.
>
> As you likely know, a given resource active/passive resource will have to
> stop on one node before it can start on another node, and the same goes for
a
> promoted clone instance having to demote on one node before it can promote
on
> another. There are exceptions for clone instances and for promotable clones
> with promoted‑max > 1 ("allow more than one master instance"). A resource
> that's configured to run on one node at a time should not try to run on two
> nodes during failover.
>
> With that in mind, what exactly are you wanting to happen? Is the problem
> that all resources are stopping on node 1 before any of them start on node
2?
> Or that you want Pacemaker shutdown to kill the processes on node 1 instead
> of cleanly shutting them down? Or something different?
>
> These are the actions and logs I saw during the test:
>
> Ack. This seems like it's just telling us that Pacemaker is going through a
> graceful shutdown. The info more relevant to the resource stop/start order
> would be in /var/log/pacemaker/pacemaker.log (or less detailed in
> /var/log/messages) on the DC.
>
> # /etc/init.d/pacemaker stop
> Signaling Pacemaker Cluster Manager to terminate
>
> Waiting for cluster services to
> unload..............................................................sending
> signal 9 to procs
>
>
> 2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling
> Pacemaker Cluster Manager to terminate
> 2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting
> for cluster services to unload
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140
> warning: new_event_notification (6140‑6141‑9): Broken pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140
> warning: Notification of client
stonithd/665bde82‑cb28‑40f7‑9132‑8321dc2f1992
> failed
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140
> warning: new_event_notification (6140‑6143‑8): Broken pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140
> warning: Notification of client attrd/a26ca273‑3422‑4ebe‑8cb7‑95849b8ff130
> failed
> 2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740
> pacemaker‑schedulerd.6240 warning: Blind faith: not fencing unseen nodes
> 2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is
> inactive (3).
>
> Regards,
> Harvey
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
> ‑‑
> Regards,
>
> Reid Wahl, RHCA
> Software Maintenance Engineer, Red Hat
> CEE ‑ Platform Support Delivery ‑ ClusterHA
More information about the Users
mailing list