<div dir="ltr">No problem! That's what we're here for. I'm glad it's sorted out :)<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Aug 28, 2020 at 12:27 AM Citron Vert <<a href="mailto:citron_vert@hotmail.com">citron_vert@hotmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p>Hi,</p>

    <p>You are right, the problems seem to come from some services that

      are started at startup. <br>

      <br>

      My installation script disables all startup options for all

      services we use, that's why I didn't focus on this possibility. <br>

    </p>

    <p>But after a quick investigation, a colleague had the good idea to

      make a "security" script that monitors and starts certain

      services.</p>

    <p><br>

    </p>

    <p>Sorry to have contacted you for this little mistake, <br>

    </p>

    <p>Thank you for the help, it was effective</p>

    <p>Quentin<br>

    </p>

    <p><br>

    </p>

    <p><br>

    </p>

    <div>Le 27/08/2020 à 09:56, Reid Wahl a

      écrit :<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>Hi, Quentin. Thanks for the logs!</div>

        <div><br>

        </div>

        <div>I see you highlighted the fact that SERVICE1 was in

          "Stopping" state on both node 1 and node 2 when node 1 was

          rejoining the cluster. I also noted the following later in the

          logs, as well as some similar messages earlier:<br>

        </div>

        <div><br>

        </div>

        <div>

          <pre>Aug 27 08:47:02 [1330] NODE2    pengine:     info: determine_op_status:       Operation monitor found resource SERVICE1 active on NODE1

Aug 27 08:47:02 [1330] NODE2    pengine:     info: determine_op_status:       Operation monitor found resource SERVICE1 active on NODE1

Aug 27 08:47:02 [1330] NODE2    pengine:     info: determine_op_status:       Operation monitor found resource SERVICE4 active on NODE2

Aug 27 08:47:02 [1330] NODE2    pengine:     info: determine_op_status:       Operation monitor found resource SERVICE1 active on NODE2

...

Aug 27 08:47:02 [1330] NODE2    pengine:     info: common_print:              1 : NODE1

Aug 27 08:47:02 [1330] NODE2    pengine:     info: common_print:              2 : NODE2

...

Aug 27 08:47:02 [1330] NODE2    pengine:    error: native_create_actions:     Resource SERVICE1 is active on 2 nodes (attempting recovery)

Aug 27 08:47:02 [1330] NODE2    pengine:   notice: native_create_actions:     See <a href="https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active" target="_blank">https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active</a> for more information

</pre>

          <pre><font face="arial,sans-serif">Can you make sure that all the cluster-managed systemd services are disabled from starting at boot (i.e., `systemctl is-enabled service1`, and the same for all the others) on both nodes? If they are enabled, disable them.</font>

</pre>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Thu, Aug 27, 2020 at 12:46

          AM Citron Vert <<a href="mailto:citron_vert@hotmail.com" target="_blank">citron_vert@hotmail.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <p>Hi,</p>

            <p>Sorry for using this email adress, my name is Quentin.

              Thank you for your reply.</p>

            <p>I have already tried the stickiness solution (with the

              deprecated  value). I tried the one you gave me, and it

              does not change anything. <br>

            </p>

            <p>Resources don't seem to move from node to node (i don't

              see the changes with crm_mon command).</p>

            <p><br>

            </p>

            <p>In the logs i found this line <i>"error:

                native_create_actions:     Resource SERVICE1 is active

                on 2 nodes</i>"</p>

            <p>Which led me to contact you to understand and learn a

              little more about this cluster. And why there are running

              resources on the passive node.<br>

            </p>

            <p><br>

            </p>

            <p>You will find attached the logs during the reboot of the

              passive node and my cluster configuration.<br>

            </p>

            <p>I think I'm missing out on something in the configuration

              / logs that I don't understand..</p>

            <p><br>

            </p>

            <p>Thank you in advance for your help,</p>

            <p>Quentin<br>

            </p>

            <p><br>

            </p>

            <div>Le 26/08/2020 à 20:16, Reid Wahl a écrit :<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">

                <div>Hi, Citron.</div>

                <div><br>

                </div>

                <div>Based on your description, it sounds like some

                  resources **might** be moving from node 1 to node 2,

                  failing on node 2, and then moving back to node 1. If

                  that's what's happening (and even if it's not), then

                  it's probably smart to set some resource stickiness as

                  a resource default. The below command sets a resource

                  stickiness score of 1.<br>

                </div>

                <div><br>

                </div>

                <div>    # pcs resource defaults resource-stickiness=1<br>

                </div>

                <div><br>

                </div>

                <div>Also note that the "default-resource-stickiness"

                  cluster property is deprecated and should not be used.</div>

                <div><br>

                </div>

                <div>Finally, an explicit default resource stickiness

                  score of 0 can interfere with the placement of cloned

                  resource instances. If you don't want any stickiness,

                  then it's better to leave stickiness unset. That way,

                  primitives will have a stickiness of 0, but clone

                  instances will have a stickiness of 1.<br>

                </div>

                <div><br>

                </div>

                <div>If adding stickiness does not resolve the issue,

                  can you share your cluster configuration and some logs

                  that show the issue happening? Off the top of my head

                  I'm not sure why resources would start and stop on

                  node 2 without moving away from node1, unless they're

                  clone instances that are starting and then failing a

                  monitor operation on node 2.</div>

              </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Wed, Aug 26, 2020

                  at 8:42 AM Citron Vert <<a href="mailto:citron_vert@hotmail.com" target="_blank">citron_vert@hotmail.com</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                  <div>

                    <p>Hello,<br>

                      I am contacting you because I have a problem with

                      my cluster and I cannot find (nor understand) any

                      information that can help me.</p>

                    <p>I have a 2 nodes cluster (pacemaker, corosync,

                      pcs) installed on CentOS 7 with a set of

                      configuration.<br>

                      Everything seems to works fine, but here is what

                      happens:</p>

                    <ul>

                      <li>Node1 and Node2 are running well with Node1 as

                        primary<br>

                      </li>

                      <li>I reboot Node2 wich is passive (no changes on

                        Node1)</li>

                      <li>Node2 comes back in the cluster as passive<br>

                      </li>

                      <li>corosync logs shows resources getting started

                        then stopped on Node2</li>

                      <li>"crm_mon" command shows some ressources on

                        Node1 getting restarted <br>

                      </li>

                    </ul>

                    <p>I don't understand how it should work.<br>

                      If a node comes back, and becomes passive (since

                      Node1 is running primary), there is no reason for

                      the resources to be started then stopped on the

                      new passive node ?<br>

                    </p>

                    <p>One of my resources becomes unstable because it

                      gets started and then stoped too quickly on Node2,

                      wich seems to make it restart on Node1 without a

                      failover.</p>

                    <p>I tried several things and solution proposed by

                      different sites and forums but without success.</p>

                    <p><br>

                    </p>

                    <p>Is there a way so that the node, which joins the

                      cluster as passive, does not start its own

                      resources ?</p>

                    <p><br>

                    </p>

                    <p>thanks in advance</p>

                    <p><br>

                    </p>

                    <p>Here are some information just in case :</p>

                    <div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:Consolas,"Courier New",monospace;font-weight:normal;font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(212,212,212)">$ rpm -qa | grep -E </span><span style="color:rgb(206,145,120)">"corosync|pacemaker|pcs"</span></div><div><span style="color:rgb(212,212,212)">   corosync-2.4.5-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)">   pacemaker-cli-1.1.21-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)">   pacemaker-1.1.21-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)">   pcs-0.9.168-4.el7.centos.x86_64</span></div><div><span style="color:rgb(212,212,212)">   corosynclib-2.4.5-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)">   pacemaker-libs-1.1.21-4.el7.x86_64</span></div><div><span style="color:rgb(212,212,212)">   pacemaker-cluster-libs-1.1.21-4.el7.x86_64</span></div></div>

                    <p><br>

                    </p>

                    <div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:Consolas,"Courier New",monospace;font-weight:normal;font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-stonith-enabled"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"stonith-enabled"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"false"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-no-quorum-policy"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"no-quorum-policy"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"ignore"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-dc-deadtime"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"dc-deadtime"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"120s"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-have-watchdog"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"have-watchdog"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"false"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-dc-version"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"dc-version"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"1.1.21-4.el7-f14e36fd43"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-cluster-infrastructure"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"cluster-infrastructure"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"corosync"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-cluster-name"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"cluster-name"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"CLUSTER"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-last-lrm-refresh"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"last-lrm-refresh"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"1598446314"</span><span style="color:rgb(212,212,212)">/></span></div><div><span style="color:rgb(212,212,212)">        <nvpair id=</span><span style="color:rgb(206,145,120)">"cib-bootstrap-options-default-resource-stickiness"</span><span style="color:rgb(212,212,212)"> name=</span><span style="color:rgb(206,145,120)">"default-resource-stickiness"</span><span style="color:rgb(212,212,212)"> value=</span><span style="color:rgb(206,145,120)">"0"</span><span style="color:rgb(212,212,212)">/></span></div></div>

                    <p><br>

                    </p>

                    <p><br>

                    </p>

                    <p><br>

                    </p>

                  </div>

                  _______________________________________________<br>

                  Manage your subscription:<br>

                  <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

                  <br>

                  ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

                </blockquote>

              </div>

              <br clear="all">

              <br>

              -- <br>

              <div dir="ltr">

                <div dir="ltr">

                  <div>

                    <div dir="ltr">

                      <div>

                        <div dir="ltr">

                          <div>

                            <div dir="ltr">

                              <div>

                                <div dir="ltr">

                                  <div>

                                    <div dir="ltr">

                                      <div>

                                        <div dir="ltr">

                                          <div>

                                            <div>Regards,<br>

                                              <br>

                                            </div>

                                            Reid Wahl, RHCA<br>

                                          </div>

                                          <div>Software Maintenance

                                            Engineer, Red Hat<br>

                                          </div>

                                          CEE - Platform Support

                                          Delivery - ClusterHA</div>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

        </blockquote>

      </div>

      <br clear="all">

      <br>

      -- <br>

      <div dir="ltr">

        <div dir="ltr">

          <div>

            <div dir="ltr">

              <div>

                <div dir="ltr">

                  <div>

                    <div dir="ltr">

                      <div>

                        <div dir="ltr">

                          <div>

                            <div dir="ltr">

                              <div>

                                <div dir="ltr">

                                  <div>

                                    <div>Regards,<br>

                                      <br>

                                    </div>

                                    Reid Wahl, RHCA<br>

                                  </div>

                                  <div>Software Maintenance Engineer,

                                    Red Hat<br>

                                  </div>

                                  CEE - Platform Support Delivery -

                                  ClusterHA</div>

                              </div>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div>Regards,<br><br></div>Reid Wahl, RHCA<br></div><div>Software Maintenance Engineer, Red Hat<br></div>CEE - Platform Support Delivery - ClusterHA</div></div></div></div></div></div></div></div></div></div></div></div></div></div>