[ClusterLabs] Antw: [EXT] Re: Pacemaker Shutdown

Thu Jul 23 02:39:13 EDT 2020

>>> Harvey Shepherd <Harvey.Shepherd at Aviatnet.com> schrieb am 22.07.2020 um
23:43
in Nachricht
<CY4PR2201MB1142A9006826EE69A7FBEF458B790 at CY4PR2201MB1142.namprd22.prod.outlook.
om>:
> Thanks for your response Reid. What you say makes sense, and under normal 
> circumstances if a resource failed, I'd want all of its dependents to be 
> stopped cleanly before restarting the failed resource. However if pacemaker

> is shutting down on a node (e.g. due to a restart request), then I just want

> to failover as fast as possible, so an unclean kill is fine. At the moment 
> the shutdown process is taking 2 mins. I was just wondering if there was a 
> way to do this.

Hi!

I think you are mixing two concepts: A shutdown request is the attempt to stop
things cleanly all the time, while a node failure (which will be followed by a
fencing opration) definitely will be unable to do a clean shutdown as the node
is considered to be dead already.
Also remember that even STONITH (fencing will take some time), and maybe
generally it's better to try a stop with timeout (which will fence THEN if the
timeout expired).

And of course: HA software is not to make any stop operation faster ;-)

Regards,
Ulrich

> 
> Regards,
> Harvey
> 
> ________________________________
> From: Users <users‑bounces at clusterlabs.org> on behalf of Reid Wahl 
> <nwahl at redhat.com>
> Sent: 23 July 2020 08:05
> To: Cluster Labs ‑ All topics related to open‑source clustering welcomed 
> <users at clusterlabs.org>
> Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown
> 
> 
> On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd 
> <Harvey.Shepherd at aviatnet.com<mailto:Harvey.Shepherd at aviatnet.com>> wrote:
> Hi All,
> 
> I'm running Pacemaker 2.0.3 on a two‑node cluster, controlling 40+ resources

> which are a mixture of clones and other resources that are colocated with
the 
> master instance of certain clones. I've noticed that if I terminate
pacemaker 
> on the node that is hosting the master instances of the clones, Pacemaker 
> focuses on stopping resources on that node BEFORE failing over to the other

> node, leading to a longer outage than necessary. Is there a way to change 
> this behaviour?
> 
> Hi, Harvey.
> 
> As you likely know, a given resource active/passive resource will have to 
> stop on one node before it can start on another node, and the same goes for
a 
> promoted clone instance having to demote on one node before it can promote
on 
> another. There are exceptions for clone instances and for promotable clones

> with promoted‑max > 1 ("allow more than one master instance"). A resource 
> that's configured to run on one node at a time should not try to run on two

> nodes during failover.
> 
> With that in mind, what exactly are you wanting to happen? Is the problem 
> that all resources are stopping on node 1 before any of them start on node
2? 
> Or that you want Pacemaker shutdown to kill the processes on node 1 instead

> of cleanly shutting them down? Or something different?
> 
> These are the actions and logs I saw during the test:
> 
> Ack. This seems like it's just telling us that Pacemaker is going through a

> graceful shutdown. The info more relevant to the resource stop/start order 
> would be in /var/log/pacemaker/pacemaker.log (or less detailed in 
> /var/log/messages) on the DC.
> 
> # /etc/init.d/pacemaker stop
> Signaling Pacemaker Cluster Manager to terminate
> 
> Waiting for cluster services to 
> unload..............................................................sending

> signal 9 to procs
> 
> 
> 2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling

> Pacemaker Cluster Manager to terminate
> 2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting 
> for cluster services to unload
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: new_event_notification (6140‑6141‑9): Broken pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: Notification of client
stonithd/665bde82‑cb28‑40f7‑9132‑8321dc2f1992 
> failed
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: new_event_notification (6140‑6143‑8): Broken pipe (32)
> 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740
pacemaker‑based.6140 
>  warning: Notification of client attrd/a26ca273‑3422‑4ebe‑8cb7‑95849b8ff130

> failed
> 2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740 
> pacemaker‑schedulerd.6240  warning: Blind faith: not fencing unseen nodes
> 2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is

> inactive (3).
> 
> Regards,
> Harvey
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> ‑‑
> Regards,
> 
> Reid Wahl, RHCA
> Software Maintenance Engineer, Red Hat
> CEE ‑ Platform Support Delivery ‑ ClusterHA