<div dir="ltr">No problem! That's what we're here for. I'm glad it's sorted out :)<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Aug 28, 2020 at 12:27 AM Citron Vert <<a href="mailto:citron_vert@hotmail.com">citron_vert@hotmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi,</p>
<p>You are right, the problems seem to come from some services that
are started at startup. <br>
<br>
My installation script disables all startup options for all
services we use, that's why I didn't focus on this possibility. <br>
</p>
<p>But after a quick investigation, a colleague had the good idea to
make a "security" script that monitors and starts certain
services.</p>
<p><br>
</p>
<p>Sorry to have contacted you for this little mistake, <br>
</p>
<p>Thank you for the help, it was effective</p>
<p>Quentin<br>
</p>
<p><br>
</p>
<p><br>
</p>
<div>Le 27/08/2020 à 09:56, Reid Wahl a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi, Quentin. Thanks for the logs!</div>
<div><br>
</div>
<div>I see you highlighted the fact that SERVICE1 was in
"Stopping" state on both node 1 and node 2 when node 1 was
rejoining the cluster. I also noted the following later in the
logs, as well as some similar messages earlier:<br>
</div>
<div><br>
</div>
<div>
<pre>Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE1
Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE1
Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE4 active on NODE2
Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE2
...
Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: 1 : NODE1
Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: 2 : NODE2
...
Aug 27 08:47:02 [1330] NODE2 pengine: error: native_create_actions: Resource SERVICE1 is active on 2 nodes (attempting recovery)
Aug 27 08:47:02 [1330] NODE2 pengine: notice: native_create_actions: See <a href="https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active" target="_blank">https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active</a> for more information
</pre>
<pre><font face="arial,sans-serif">Can you make sure that all the cluster-managed systemd services are disabled from starting at boot (i.e., `systemctl is-enabled service1`, and the same for all the others) on both nodes? If they are enabled, disable them.</font>
</pre>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Aug 27, 2020 at 12:46
AM Citron Vert <<a href="mailto:citron_vert@hotmail.com" target="_blank">citron_vert@hotmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi,</p>
<p>Sorry for using this email adress, my name is Quentin.
Thank you for your reply.</p>
<p>I have already tried the stickiness solution (with the
deprecated value). I tried the one you gave me, and it
does not change anything. <br>
</p>
<p>Resources don't seem to move from node to node (i don't
see the changes with crm_mon command).</p>
<p><br>
</p>
<p>In the logs i found this line <i>"error:
native_create_actions: Resource SERVICE1 is active
on 2 nodes</i>"</p>
<p>Which led me to contact you to understand and learn a
little more about this cluster. And why there are running
resources on the passive node.<br>
</p>
<p><br>
</p>
<p>You will find attached the logs during the reboot of the
passive node and my cluster configuration.<br>
</p>
<p>I think I'm missing out on something in the configuration
/ logs that I don't understand..</p>
<p><br>
</p>
<p>Thank you in advance for your help,</p>
<p>Quentin<br>
</p>
<p><br>
</p>
<div>Le 26/08/2020 à 20:16, Reid Wahl a écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi, Citron.</div>
<div><br>
</div>
<div>Based on your description, it sounds like some
resources **might** be moving from node 1 to node 2,
failing on node 2, and then moving back to node 1. If
that's what's happening (and even if it's not), then
it's probably smart to set some resource stickiness as
a resource default. The below command sets a resource
stickiness score of 1.<br>
</div>
<div><br>
</div>
<div> # pcs resource defaults resource-stickiness=1<br>
</div>
<div><br>
</div>
<div>Also note that the "default-resource-stickiness"
cluster property is deprecated and should not be used.</div>
<div><br>
</div>
<div>Finally, an explicit default resource stickiness
score of 0 can interfere with the placement of cloned
resource instances. If you don't want any stickiness,
then it's better to leave stickiness unset. That way,
primitives will have a stickiness of 0, but clone
instances will have a stickiness of 1.<br>
</div>
<div><br>
</div>
<div>If adding stickiness does not resolve the issue,
can you share your cluster configuration and some logs
that show the issue happening? Off the top of my head
I'm not sure why resources would start and stop on
node 2 without moving away from node1, unless they're
clone instances that are starting and then failing a
monitor operation on node 2.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Aug 26, 2020
at 8:42 AM Citron Vert <<a href="mailto:citron_vert@hotmail.com" target="_blank">citron_vert@hotmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hello,<br>
I am contacting you because I have a problem with
my cluster and I cannot find (nor understand) any
information that can help me.</p>
<p>I have a 2 nodes cluster (pacemaker, corosync,
pcs) installed on CentOS 7 with a set of
configuration.<br>
Everything seems to works fine, but here is what
happens:</p>
<ul>
<li>Node1 and Node2 are running well with Node1 as
primary<br>
</li>
<li>I reboot Node2 wich is passive (no changes on
Node1)</li>
<li>Node2 comes back in the cluster as passive<br>
</li>
<li>corosync logs shows resources getting started
then stopped on Node2</li>
<li>"crm_mon" command shows some ressources on
Node1 getting restarted <br>
</li>
</ul>
<p>I don't understand how it should work.<br>
If a node comes back, and becomes passive (since
Node1 is running primary), there is no reason for
the resources to be started then stopped on the
new passive node ?<br>
</p>
<p>One of my resources becomes unstable because it
gets started and then stoped too quickly on Node2,
wich seems to make it restart on Node1 without a
failover.</p>
<p>I tried several things and solution proposed by
different sites and forums but without success.</p>
<p><br>
</p>
<p>Is there a way so that the node, which joins the
cluster as passive, does not start its own
resources ?</p>
<p><br>
</p>
<p>thanks in advance</p>
<p><br>
</p>
<p>Here are some information just in case :</p>
<div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:Consolas,"Courier New",monospace;font-weight:normal;font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(212,212,212)">$ rpm -qa | grep -E </span><span style="color:rgb(206,145,120)">"corosync|pacemaker|pcs"</span></div><div><span style="color:rgb(212,212,212)"> corosync-2.4.5-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)"> pacemaker-cli-1.1.21-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)"> pacemaker-1.1.21-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)"> pcs-0.9.168-4.el7.centos.x86_64</span></div><div><span style="color:rgb(212,212,212)"> corosynclib-2.4.5-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)"> pacemaker-libs-1.1.21-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)"> pacemaker-cluster-libs-1.1.21-4.el7.x86_64</span></div></div>
<p><br>
</p>
<div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:Consolas,"Courier New",monospace;font-weight:normal;font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-stonith-enabled"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"stonith-enabled"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"false"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-no-quorum-policy"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"no-quorum-policy"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"ignore"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-dc-deadtime"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"dc-deadtime"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"120s"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-have-watchdog"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"have-watchdog"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"false"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-dc-version"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"dc-version"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"1.1.21-4.el7-f14e36fd43"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-cluster-infrastructure"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"cluster-infrastructure"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"corosync"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-cluster-name"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"cluster-name"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"CLUSTER"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-last-lrm-refresh"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"last-lrm-refresh"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"1598446314"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)"> <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-default-resource-stickiness"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"default-resource-stickiness"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"0"</span><span style="color:rgb(212,212,212)">/></span></div></div>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
</div>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div>Regards,<br>
<br>
</div>
Reid Wahl, RHCA<br>
</div>
<div>Software Maintenance
Engineer, Red Hat<br>
</div>
CEE - Platform Support
Delivery - ClusterHA</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div>Regards,<br>
<br>
</div>
Reid Wahl, RHCA<br>
</div>
<div>Software Maintenance Engineer,
Red Hat<br>
</div>
CEE - Platform Support Delivery -
ClusterHA</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div>Regards,<br><br></div>Reid Wahl, RHCA<br></div><div>Software Maintenance Engineer, Red Hat<br></div>CEE - Platform Support Delivery - ClusterHA</div></div></div></div></div></div></div></div></div></div></div></div></div></div>