[ClusterLabs] Resources restart when a node joins in

Reid Wahl nwahl at redhat.com
Thu Aug 27 03:56:29 EDT 2020


Hi, Quentin. Thanks for the logs!

I see you highlighted the fact that SERVICE1 was in "Stopping" state on
both node 1 and node 2 when node 1 was rejoining the cluster. I also noted
the following later in the logs, as well as some similar messages earlier:

Aug 27 08:47:02 [1330] NODE2    pengine:     info:
determine_op_status:       Operation monitor found resource SERVICE1
active on NODE1
Aug 27 08:47:02 [1330] NODE2    pengine:     info:
determine_op_status:       Operation monitor found resource SERVICE1
active on NODE1
Aug 27 08:47:02 [1330] NODE2    pengine:     info:
determine_op_status:       Operation monitor found resource SERVICE4
active on NODE2
Aug 27 08:47:02 [1330] NODE2    pengine:     info:
determine_op_status:       Operation monitor found resource SERVICE1
active on NODE2
...
Aug 27 08:47:02 [1330] NODE2    pengine:     info: common_print:
       1 : NODE1
Aug 27 08:47:02 [1330] NODE2    pengine:     info: common_print:
       2 : NODE2
...
Aug 27 08:47:02 [1330] NODE2    pengine:    error:
native_create_actions:     Resource SERVICE1 is active on 2 nodes
(attempting recovery)
Aug 27 08:47:02 [1330] NODE2    pengine:   notice:
native_create_actions:     See
https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
information

Can you make sure that all the cluster-managed systemd services are
disabled from starting at boot (i.e., `systemctl is-enabled service1`,
and the same for all the others) on both nodes? If they are enabled,
disable them.


On Thu, Aug 27, 2020 at 12:46 AM Citron Vert <citron_vert at hotmail.com>
wrote:

> Hi,
>
> Sorry for using this email adress, my name is Quentin. Thank you for your
> reply.
>
> I have already tried the stickiness solution (with the deprecated  value).
> I tried the one you gave me, and it does not change anything.
>
> Resources don't seem to move from node to node (i don't see the changes
> with crm_mon command).
>
>
> In the logs i found this line *"error: native_create_actions:
> Resource SERVICE1 is active on 2 nodes*"
>
> Which led me to contact you to understand and learn a little more about
> this cluster. And why there are running resources on the passive node.
>
>
> You will find attached the logs during the reboot of the passive node and
> my cluster configuration.
>
> I think I'm missing out on something in the configuration / logs that I
> don't understand..
>
>
> Thank you in advance for your help,
>
> Quentin
>
>
> Le 26/08/2020 à 20:16, Reid Wahl a écrit :
>
> Hi, Citron.
>
> Based on your description, it sounds like some resources **might** be
> moving from node 1 to node 2, failing on node 2, and then moving back to
> node 1. If that's what's happening (and even if it's not), then it's
> probably smart to set some resource stickiness as a resource default. The
> below command sets a resource stickiness score of 1.
>
>     # pcs resource defaults resource-stickiness=1
>
> Also note that the "default-resource-stickiness" cluster property is
> deprecated and should not be used.
>
> Finally, an explicit default resource stickiness score of 0 can interfere
> with the placement of cloned resource instances. If you don't want any
> stickiness, then it's better to leave stickiness unset. That way,
> primitives will have a stickiness of 0, but clone instances will have a
> stickiness of 1.
>
> If adding stickiness does not resolve the issue, can you share your
> cluster configuration and some logs that show the issue happening? Off the
> top of my head I'm not sure why resources would start and stop on node 2
> without moving away from node1, unless they're clone instances that are
> starting and then failing a monitor operation on node 2.
>
> On Wed, Aug 26, 2020 at 8:42 AM Citron Vert <citron_vert at hotmail.com>
> wrote:
>
>> Hello,
>> I am contacting you because I have a problem with my cluster and I cannot
>> find (nor understand) any information that can help me.
>>
>> I have a 2 nodes cluster (pacemaker, corosync, pcs) installed on CentOS 7
>> with a set of configuration.
>> Everything seems to works fine, but here is what happens:
>>
>>    - Node1 and Node2 are running well with Node1 as primary
>>    - I reboot Node2 wich is passive (no changes on Node1)
>>    - Node2 comes back in the cluster as passive
>>    - corosync logs shows resources getting started then stopped on Node2
>>    - "crm_mon" command shows some ressources on Node1 getting restarted
>>
>> I don't understand how it should work.
>> If a node comes back, and becomes passive (since Node1 is running
>> primary), there is no reason for the resources to be started then stopped
>> on the new passive node ?
>>
>> One of my resources becomes unstable because it gets started and then
>> stoped too quickly on Node2, wich seems to make it restart on Node1 without
>> a failover.
>>
>> I tried several things and solution proposed by different sites and
>> forums but without success.
>>
>>
>> Is there a way so that the node, which joins the cluster as passive, does
>> not start its own resources ?
>>
>>
>> thanks in advance
>>
>>
>> Here are some information just in case :
>> $ rpm -qa | grep -E "corosync|pacemaker|pcs"
>> corosync-2.4.5-4.el7.x86_64
>> pacemaker-cli-1.1.21-4.el7.x86_64
>> pacemaker-1.1.21-4.el7.x86_64
>> pcs-0.9.168-4.el7.centos.x86_64
>> corosynclib-2.4.5-4.el7.x86_64
>> pacemaker-libs-1.1.21-4.el7.x86_64
>> pacemaker-cluster-libs-1.1.21-4.el7.x86_64
>>
>>
>>         <nvpair id="cib-bootstrap-options-stonith-enabled" name=
>> "stonith-enabled" value="false"/>
>>         <nvpair id="cib-bootstrap-options-no-quorum-policy" name=
>> "no-quorum-policy" value="ignore"/>
>>         <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-deadtime"
>>  value="120s"/>
>>         <nvpair id="cib-bootstrap-options-have-watchdog" name=
>> "have-watchdog" value="false"/>
>>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
>>  value="1.1.21-4.el7-f14e36fd43"/>
>>         <nvpair id="cib-bootstrap-options-cluster-infrastructure" name=
>> "cluster-infrastructure" value="corosync"/>
>>         <nvpair id="cib-bootstrap-options-cluster-name" name=
>> "cluster-name" value="CLUSTER"/>
>>         <nvpair id="cib-bootstrap-options-last-lrm-refresh" name=
>> "last-lrm-refresh" value="1598446314"/>
>>         <nvpair id="cib-bootstrap-options-default-resource-stickiness"
>>  name="default-resource-stickiness" value="0"/>
>>
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
>
> --
> Regards,
>
> Reid Wahl, RHCA
> Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
>
>

-- 
Regards,

Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200827/12f133d5/attachment-0001.htm>


More information about the Users mailing list