[ClusterLabs] Resources restart when a node joins in
Citron Vert
citron_vert at hotmail.com
Fri Aug 28 03:26:41 EDT 2020
Hi,
You are right, the problems seem to come from some services that are
started at startup.
My installation script disables all startup options for all services we
use, that's why I didn't focus on this possibility.
But after a quick investigation, a colleague had the good idea to make a
"security" script that monitors and starts certain services.
Sorry to have contacted you for this little mistake,
Thank you for the help, it was effective
Quentin
Le 27/08/2020 à 09:56, Reid Wahl a écrit :
> Hi, Quentin. Thanks for the logs!
>
> I see you highlighted the fact that SERVICE1 was in "Stopping" state
> on both node 1 and node 2 when node 1 was rejoining the cluster. I
> also noted the following later in the logs, as well as some similar
> messages earlier:
>
> Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE1
> Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE1
> Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE4 active on NODE2
> Aug 27 08:47:02 [1330] NODE2 pengine: info: determine_op_status: Operation monitor found resource SERVICE1 active on NODE2
> ...
> Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: 1 : NODE1
> Aug 27 08:47:02 [1330] NODE2 pengine: info: common_print: 2 : NODE2
> ...
> Aug 27 08:47:02 [1330] NODE2 pengine: error: native_create_actions: Resource SERVICE1 is active on 2 nodes (attempting recovery)
> Aug 27 08:47:02 [1330] NODE2 pengine: notice: native_create_actions: Seehttps://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information
>
> Can you make sure that all the cluster-managed systemd services are
> disabled from starting at boot (i.e., `systemctl is-enabled service1`,
> and the same for all the others) on both nodes? If they are enabled,
> disable them.
>
> On Thu, Aug 27, 2020 at 12:46 AM Citron Vert <citron_vert at hotmail.com
> <mailto:citron_vert at hotmail.com>> wrote:
>
> Hi,
>
> Sorry for using this email adress, my name is Quentin. Thank you
> for your reply.
>
> I have already tried the stickiness solution (with the deprecated
> value). I tried the one you gave me, and it does not change anything.
>
> Resources don't seem to move from node to node (i don't see the
> changes with crm_mon command).
>
>
> In the logs i found this line /"error: native_create_actions:
> Resource SERVICE1 is active on 2 nodes/"
>
> Which led me to contact you to understand and learn a little more
> about this cluster. And why there are running resources on the
> passive node.
>
>
> You will find attached the logs during the reboot of the passive
> node and my cluster configuration.
>
> I think I'm missing out on something in the configuration / logs
> that I don't understand..
>
>
> Thank you in advance for your help,
>
> Quentin
>
>
> Le 26/08/2020 à 20:16, Reid Wahl a écrit :
>> Hi, Citron.
>>
>> Based on your description, it sounds like some resources
>> **might** be moving from node 1 to node 2, failing on node 2, and
>> then moving back to node 1. If that's what's happening (and even
>> if it's not), then it's probably smart to set some resource
>> stickiness as a resource default. The below command sets a
>> resource stickiness score of 1.
>>
>> # pcs resource defaults resource-stickiness=1
>>
>> Also note that the "default-resource-stickiness" cluster property
>> is deprecated and should not be used.
>>
>> Finally, an explicit default resource stickiness score of 0 can
>> interfere with the placement of cloned resource instances. If you
>> don't want any stickiness, then it's better to leave stickiness
>> unset. That way, primitives will have a stickiness of 0, but
>> clone instances will have a stickiness of 1.
>>
>> If adding stickiness does not resolve the issue, can you share
>> your cluster configuration and some logs that show the issue
>> happening? Off the top of my head I'm not sure why resources
>> would start and stop on node 2 without moving away from node1,
>> unless they're clone instances that are starting and then failing
>> a monitor operation on node 2.
>>
>> On Wed, Aug 26, 2020 at 8:42 AM Citron Vert
>> <citron_vert at hotmail.com <mailto:citron_vert at hotmail.com>> wrote:
>>
>> Hello,
>> I am contacting you because I have a problem with my cluster
>> and I cannot find (nor understand) any information that can
>> help me.
>>
>> I have a 2 nodes cluster (pacemaker, corosync, pcs) installed
>> on CentOS 7 with a set of configuration.
>> Everything seems to works fine, but here is what happens:
>>
>> * Node1 and Node2 are running well with Node1 as primary
>> * I reboot Node2 wich is passive (no changes on Node1)
>> * Node2 comes back in the cluster as passive
>> * corosync logs shows resources getting started then
>> stopped on Node2
>> * "crm_mon" command shows some ressources on Node1 getting
>> restarted
>>
>> I don't understand how it should work.
>> If a node comes back, and becomes passive (since Node1 is
>> running primary), there is no reason for the resources to be
>> started then stopped on the new passive node ?
>>
>> One of my resources becomes unstable because it gets started
>> and then stoped too quickly on Node2, wich seems to make it
>> restart on Node1 without a failover.
>>
>> I tried several things and solution proposed by different
>> sites and forums but without success.
>>
>>
>> Is there a way so that the node, which joins the cluster as
>> passive, does not start its own resources ?
>>
>>
>> thanks in advance
>>
>>
>> Here are some information just in case :
>>
>> $ rpm -qa | grep -E "corosync|pacemaker|pcs"
>> corosync-2.4.5-4.el7.x86_64
>> pacemaker-cli-1.1.21-4.el7.x86_64
>> pacemaker-1.1.21-4.el7.x86_64
>> pcs-0.9.168-4.el7.centos.x86_64
>> corosynclib-2.4.5-4.el7.x86_64
>> pacemaker-libs-1.1.21-4.el7.x86_64
>> pacemaker-cluster-libs-1.1.21-4.el7.x86_64
>>
>>
>> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
>> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
>> <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-deadtime" value="120s"/>
>> <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.21-4.el7-f14e36fd43"/>
>> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
>> <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="CLUSTER"/>
>> <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1598446314"/>
>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="0"/>
>>
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>>
>> --
>> Regards,
>>
>> Reid Wahl, RHCA
>> Software Maintenance Engineer, Red Hat
>> CEE - Platform Support Delivery - ClusterHA
>
>
>
> --
> Regards,
>
> Reid Wahl, RHCA
> Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200828/5c203d51/attachment-0001.htm>
More information about the Users
mailing list