[ClusterLabs] Resources restart when a node joins in

Reid Wahl nwahl at redhat.com
Wed Aug 26 14:16:53 EDT 2020


Hi, Citron.

Based on your description, it sounds like some resources **might** be
moving from node 1 to node 2, failing on node 2, and then moving back to
node 1. If that's what's happening (and even if it's not), then it's
probably smart to set some resource stickiness as a resource default. The
below command sets a resource stickiness score of 1.

    # pcs resource defaults resource-stickiness=1

Also note that the "default-resource-stickiness" cluster property is
deprecated and should not be used.

Finally, an explicit default resource stickiness score of 0 can interfere
with the placement of cloned resource instances. If you don't want any
stickiness, then it's better to leave stickiness unset. That way,
primitives will have a stickiness of 0, but clone instances will have a
stickiness of 1.

If adding stickiness does not resolve the issue, can you share your cluster
configuration and some logs that show the issue happening? Off the top of
my head I'm not sure why resources would start and stop on node 2 without
moving away from node1, unless they're clone instances that are starting
and then failing a monitor operation on node 2.

On Wed, Aug 26, 2020 at 8:42 AM Citron Vert <citron_vert at hotmail.com> wrote:

> Hello,
> I am contacting you because I have a problem with my cluster and I cannot
> find (nor understand) any information that can help me.
>
> I have a 2 nodes cluster (pacemaker, corosync, pcs) installed on CentOS 7
> with a set of configuration.
> Everything seems to works fine, but here is what happens:
>
>    - Node1 and Node2 are running well with Node1 as primary
>    - I reboot Node2 wich is passive (no changes on Node1)
>    - Node2 comes back in the cluster as passive
>    - corosync logs shows resources getting started then stopped on Node2
>    - "crm_mon" command shows some ressources on Node1 getting restarted
>
> I don't understand how it should work.
> If a node comes back, and becomes passive (since Node1 is running
> primary), there is no reason for the resources to be started then stopped
> on the new passive node ?
>
> One of my resources becomes unstable because it gets started and then
> stoped too quickly on Node2, wich seems to make it restart on Node1 without
> a failover.
>
> I tried several things and solution proposed by different sites and forums
> but without success.
>
>
> Is there a way so that the node, which joins the cluster as passive, does
> not start its own resources ?
>
>
> thanks in advance
>
>
> Here are some information just in case :
> $ rpm -qa | grep -E "corosync|pacemaker|pcs"
> corosync-2.4.5-4.el7.x86_64
> pacemaker-cli-1.1.21-4.el7.x86_64
> pacemaker-1.1.21-4.el7.x86_64
> pcs-0.9.168-4.el7.centos.x86_64
> corosynclib-2.4.5-4.el7.x86_64
> pacemaker-libs-1.1.21-4.el7.x86_64
> pacemaker-cluster-libs-1.1.21-4.el7.x86_64
>
>
>         <nvpair id="cib-bootstrap-options-stonith-enabled" name=
> "stonith-enabled" value="false"/>
>         <nvpair id="cib-bootstrap-options-no-quorum-policy" name=
> "no-quorum-policy" value="ignore"/>
>         <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-deadtime"
>  value="120s"/>
>         <nvpair id="cib-bootstrap-options-have-watchdog" name=
> "have-watchdog" value="false"/>
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
>  value="1.1.21-4.el7-f14e36fd43"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure" name=
> "cluster-infrastructure" value="corosync"/>
>         <nvpair id="cib-bootstrap-options-cluster-name" name=
> "cluster-name" value="CLUSTER"/>
>         <nvpair id="cib-bootstrap-options-last-lrm-refresh" name=
> "last-lrm-refresh" value="1598446314"/>
>         <nvpair id="cib-bootstrap-options-default-resource-stickiness"
>  name="default-resource-stickiness" value="0"/>
>
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200826/735792f3/attachment.htm>


More information about the Users mailing list