[ClusterLabs] Resources restart when a node joins in

Thu Aug 27 10:47:11 EDT 2020

Hi Quentin,

in order to get help it will be easier if you provide both corosync and pacemaker configuration.

Best Regards,
Strahil Nikolov

В четвъртък, 27 август 2020 г., 17:10:01 Гринуич+3, Citron Vert <citron_vert at hotmail.com> написа: 

Hi,

Sorry for using this email adress, my name is Quentin. Thank you for your reply.

I have already tried the stickiness solution (with the deprecated  value). I tried the one you gave me, and it does not change anything. 

Resources don't seem to move from node to node (i don't see the changes with crm_mon command).

In the logs i found this line "error: native_create_actions:     Resource SERVICE1 is active on 2 nodes"

Which led me to contact you to understand and learn a little more about this cluster. And why there are running resources on the passive node.

You will find attached the logs during the reboot of the passive node and my cluster configuration.

I think I'm missing out on something in the configuration / logs that I don't understand..

Thank you in advance for your help,

Quentin

Le 26/08/2020 à 20:16, Reid Wahl a écrit :

>  
>  
> Hi, Citron.
> 
> 
> 
> 
> Based on your description, it sounds like some resources **might** be moving from node 1 to node 2, failing on node 2, and then moving back to node 1. If that's what's happening (and even if it's not), then it's probably smart to set some resource stickiness as a resource default. The below command sets a resource stickiness score of 1.
> 
> 
> 
> 
> 
>     # pcs resource defaults resource-stickiness=1
> 
> 
> 
> 
> 
> Also note that the "default-resource-stickiness" cluster property is deprecated and should not be used.
> 
> 
> 
> 
> Finally, an explicit default resource stickiness score of 0 can interfere with the placement of cloned resource instances. If you don't want any stickiness, then it's better to leave stickiness unset. That way, primitives will have a stickiness of 0, but clone instances will have a stickiness of 1.
> 
> 
> 
> 
> 
> If adding stickiness does not resolve the issue, can you share your cluster configuration and some logs that show the issue happening? Off the top of my head I'm not sure why resources would start and stop on node 2 without moving away from node1, unless they're clone instances that are starting and then failing a monitor operation on node 2.
> 
> 
> 
>  
> On Wed, Aug 26, 2020 at 8:42 AM Citron Vert <citron_vert at hotmail.com> wrote:
> 
> 
>>  
>>  
>> Hello,
>> I am contacting you because I have a problem with my cluster and I cannot find (nor understand) any information that can help me.
>> 
>> I have a 2 nodes cluster (pacemaker, corosync, pcs) installed on CentOS 7 with a set of configuration.
>> Everything seems to works fine, but here is what happens:
>> 
>>     * Node1 and Node2 are running well with Node1 as primary
>>     * I reboot Node2 wich is passive (no changes on Node1)
>>     * Node2 comes back in the cluster as passive
>>     * corosync logs shows resources getting started then stopped on Node2
>>     * "crm_mon" command shows some ressources on Node1 getting restarted 
>> 
>> 
>> I don't understand how it should work.
>> If a node comes back, and becomes passive (since Node1 is running primary), there is no reason for the resources to be started then stopped on the new passive node ?
>> 
>> 
>> One of my resources becomes unstable because it gets started and then stoped too quickly on Node2, wich seems to make it restart on Node1 without a failover.
>> 
>> I tried several things and solution proposed by different sites and forums but without success.
>> 
>> 
>> 
>> 
>> Is there a way so that the node, which joins the cluster as passive, does not start its own resources ?
>> 
>> 
>> 
>> 
>> thanks in advance
>> 
>> 
>> 
>> 
>> Here are some information just in case :
>> 
>> $ rpm -qa | grep -E "corosync|pacemaker|pcs"
>>  corosync-2.4.5-4.el7.x86_64
>>  pacemaker-cli-1.1.21-4.el7.x86_64
>>  pacemaker-1.1.21-4.el7.x86_64
>>  pcs-0.9.168-4.el7.centos.x86_64
>>  corosynclib-2.4.5-4.el7.x86_64
>>  pacemaker-libs-1.1.21-4.el7.x86_64
>>  pacemaker-cluster-libs-1.1.21-4.el7.x86_64
>> 
>> 
>> 
>> 
>>         <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
>>         <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
>>         <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-deadtime" value="120s"/>
>>         <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
>>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.21-4.el7-f14e36fd43"/>
>>         <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
>>         <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="CLUSTER"/>
>>         <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1598446314"/>
>>         <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="0"/>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> ClusterLabs home: https://www.clusterlabs.org/
>> 
> 
> 
> 
> -- 
> 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> Regards,
> 
> 
> Reid Wahl, RHCA
> 
> 
> Software Maintenance Engineer, Red Hat
> 
> CEE - Platform Support Delivery - ClusterHA
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/