<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Fencing could work. Thanks again Reid.<br>
</div>
<div>
<div id="appendonsend"></div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Users <users-bounces@clusterlabs.org> on behalf of Reid Wahl <nwahl@redhat.com><br>
<b>Sent:</b> 23 July 2020 10:10<br>
<b>To:</b> Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org><br>
<b>Subject:</b> EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Thanks for the clarification. As far as I'm aware, there's no way to do this at the Pacemaker level during a Pacemaker shutdown. It would require uncleanly killing all resources, which doesn't make sense at the Pacemaker level.<br>
</div>
<div><br>
</div>
<div>Pacemaker only knows how to stop a resource by running the resource agent's stop operation. Even if Pacemaker wanted to kill a resource uncleanly for speed, the way to do so for each resource would depend on the type of resource. For example, an IPaddr2
resource doesn't represent a running process that can be killed; `ip addr del` would be necessary.<br>
</div>
<div><br>
</div>
<div>If we went the route of killing the Pacemaker daemon entirely, rather than relying on it to stop resources, then that wouldn't guarantee the node has stopped using the actual resources before the failover node tries to take over. For example, for a Filesystem,
the FS could still be mounted after Pacemaker is killed.</div>
<div><br>
</div>
<div>The only ways to know with certainty that node 1 has stopped using cluster resources so that node 2 can safely take them over are:</div>
<div>
<ol>
<li>gracefully stop them, or<br>
</li><li>fence/reboot node 1</li></ol>
<div>With that being said, if you don't mind node 1 being fenced to initiate a faster failover, then you could fence it from node 2.<br>
</div>
</div>
<div><br>
</div>
<div>Others on the list may think of something I haven't considered here.<br>
</div>
</div>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Wed, Jul 22, 2020 at 2:43 PM Harvey Shepherd <<a href="mailto:Harvey.Shepherd@aviatnet.com">Harvey.Shepherd@aviatnet.com</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Thanks for your response Reid. What you say makes sense, and under normal circumstances if a resource failed, I'd want all of its dependents to be stopped cleanly before restarting the failed resource. However if pacemaker is shutting down on a node (e.g. due
to a restart request), then I just want to failover as fast as possible, so an unclean kill is fine. At the moment the shutdown process is taking 2 mins. I was just wondering if there was a way to do this.</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Regards,</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Harvey<br>
</div>
<div>
<div id="x_gmail-m_-3199263651614350385appendonsend"></div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr style="display:inline-block; width:98%">
<div id="x_gmail-m_-3199263651614350385divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Users <<a href="mailto:users-bounces@clusterlabs.org" target="_blank">users-bounces@clusterlabs.org</a>> on
behalf of Reid Wahl <<a href="mailto:nwahl@redhat.com" target="_blank">nwahl@redhat.com</a>><br>
<b>Sent:</b> 23 July 2020 08:05<br>
<b>To:</b> Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
<b>Subject:</b> EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown</font>
<div> </div>
</div>
<div>
<div dir="ltr"><br>
<div>
<div dir="ltr">On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd <<a href="mailto:Harvey.Shepherd@aviatnet.com" target="_blank">Harvey.Shepherd@aviatnet.com</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Hi All,</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
I'm running Pacemaker 2.0.3 on a two-node cluster, controlling 40+ resources which are a mixture of clones and other resources that are colocated with the master instance of certain clones. I've noticed that if I terminate pacemaker on the node that is hosting
the master instances of the clones, Pacemaker focuses on stopping resources on that node BEFORE failing over to the other node, leading to a longer outage than necessary. Is there a way to change this behaviour?</div>
</div>
</blockquote>
<div><br>
</div>
<div>Hi, Harvey.</div>
<div><br>
</div>
<div>As you likely know, a given resource active/passive resource will have to stop on one node before it can start on another node, and the same goes for a promoted clone instance having to demote on one node before it can promote on another. There are exceptions
for clone instances and for promotable clones with promoted-max > 1 ("allow more than one master instance"). A resource that's configured to run on one node at a time should not try to run on two nodes during failover.<br>
</div>
<div><br>
</div>
<div>With that in mind, what exactly are you wanting to happen? Is the problem that all resources are stopping on node 1 before
<i><b>any</b></i> of them start on node 2? Or that you want Pacemaker shutdown to kill the processes on node 1 instead of cleanly shutting them down? Or something different?<br>
</div>
<div><br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
These are the actions and logs I saw during the test:</div>
</div>
</blockquote>
<div><br>
</div>
<div>Ack. This seems like it's just telling us that Pacemaker is going through a graceful shutdown. The info more relevant to the resource stop/start order would be in /var/log/pacemaker/pacemaker.log (or less detailed in /var/log/messages) on the DC.<br>
</div>
<div><br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<span># /etc/init.d/pacemaker stop<br>
</span>
<div>Signaling Pacemaker Cluster Manager to terminate<br>
</div>
<div><br>
</div>
<div>Waiting for cluster services to unload..............................................................sending signal 9 to procs<br>
</div>
<span></span><br>
</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<span>2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling Pacemaker Cluster Manager to terminate<br>
</span>
<div>2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting for cluster services to unload<br>
</div>
<div>2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: new_event_notification (6140-6141-9): Broken pipe (32)<br>
</div>
<div>2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: Notification of client stonithd/665bde82-cb28-40f7-9132-8321dc2f1992 failed<br>
</div>
<div>2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: new_event_notification (6140-6143-8): Broken pipe (32)<br>
</div>
<div>2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker-based.6140 warning: Notification of client attrd/a26ca273-3422-4ebe-8cb7-95849b8ff130 failed<br>
</div>
<div>2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740 pacemaker-schedulerd.6240 warning: Blind faith: not fencing unseen nodes<br>
</div>
<div>2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is inactive (3).<br>
</div>
<span></span></div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Regards,</div>
<div style="font-family:Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Harvey<br>
</div>
</div>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">
https://www.clusterlabs.org/</a><br>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div>Regards,<br>
<br>
</div>
Reid Wahl, RHCA<br>
</div>
<div>Software Maintenance Engineer, Red Hat<br>
</div>
CEE - Platform Support Delivery - ClusterHA</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">
https://www.clusterlabs.org/</a><br>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr" class="x_gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div>Regards,<br>
<br>
</div>
Reid Wahl, RHCA<br>
</div>
<div>Software Maintenance Engineer, Red Hat<br>
</div>
CEE - Platform Support Delivery - ClusterHA</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>