[ClusterLabs] Resources are stopped and started when one node rejoins

Sat Aug 26 15:16:25 UTC 2017

While I am by no means a CRM/Pacemaker expert, I only see the resource primitives and the order constraints. Wouldn’t you need location and/or colocation as well as stickiness settings to prevent this from happening? What I think it might be doing is seeing the new node, then trying to move the resources (but not finding it a suitable target) and then moving them back where they came from, but fast enough for you to only see it as a restart.

If you crm_resource -P, it should also restart all resources, but put them in the preferred spot. If they end up in the same place, you probably didn’t put and weighing in the config or have stickiness set to INF.

Kind regards,

John Keates

> On 26 Aug 2017, at 14:23, Octavian Ciobanu <coctavian1979 at gmail.com> wrote:
> 
> Hello all,
> 
> While playing with cluster configuration I noticed a strange behavior. If I stop/standby cluster services on one node and reboot it, when it joins the cluster all the resources that were started and working on active nodes get stopped and restarted.
> 
> My testing configuration is based on 4 nodes. One node is a storage node that makes 3 iSCSI targets available for the other nodes to use,it is not configured to join cluster, and three nodes that are configured in a cluster using the following commands.
> 
> pcs resource create DLM ocf:pacemaker:controld op monitor interval="60" on-fail="fence" clone meta clone-max="3" clone-node-max="1" interleave="true" ordered="true"
> pcs resource create iSCSI1 ocf:heartbeat:iscsi portal="10.0.0.1:3260 <http://10.0.0.1:3260/>" target="iqn.2017-08.example.com:tgt1" op start interval="0" timeout="20" op stop interval="0" timeout="20" op monitor interval="120" timeout="30" clone meta clone-max="3" clone-node-max="1"
> pcs resource create iSCSI2 ocf:heartbeat:iscsi portal="10.0.0.1:3260 <http://10.0.0.1:3260/>" target="iqn.2017-08.example.com:tgt2" op start interval="0" timeout="20" op stop interval="0" timeout="20" op monitor interval="120" timeout="30" clone meta clone-max="3" clone-node-max="1"
> pcs resource create iSCSI3 ocf:heartbeat:iscsi portal="10.0.0.1:3260 <http://10.0.0.1:3260/>" target="iqn.2017-08.example.com:tgt3" op start interval="0" timeout="20" op stop interval="0" timeout="20" op monitor interval="120" timeout="30" clone meta clone-max="3" clone-node-max="1"
> pcs resource create Mount1 ocf:heartbeat:Filesystem device="/dev/disk/by-label/MyCluster:Data1" directory="/mnt/data1" fstype="gfs2" options="noatime,nodiratime,rw" op monitor interval="90" on-fail="fence" clone meta clone-max="3" clone-node-max="1" interleave="true"
> pcs resource create Mount2 ocf:heartbeat:Filesystem device="/dev/disk/by-label/MyCluster:Data2" directory="/mnt/data2" fstype="gfs2" options="noatime,nodiratime,rw" op monitor interval="90" on-fail="fence" clone meta clone-max="3" clone-node-max="1" interleave="true"
> pcs resource create Mount3 ocf:heartbeat:Filesystem device="/dev/disk/by-label/MyCluster:Data3" directory="/mnt/data3" fstype="gfs2" options="noatime,nodiratime,rw" op monitor interval="90" on-fail="fence" clone meta clone-max="3" clone-node-max="1" interleave="true"
> pcs constraint order DLM-clone then iSCSI1-clone
> pcs constraint order DLM-clone then iSCSI2-clone
> pcs constraint order DLM-clone then iSCSI3-clone
> pcs constraint order iSCSI1-clone then Mount1-clone
> pcs constraint order iSCSI2-clone then Mount2-clone
> pcs constraint order iSCSI3-clone then Mount3-clone
> 
> If I issue the command "pcs cluster standby node1" or "pcs cluster stop" on node 1 and after that I reboot the node. When the node gets back online (unstandby if it was put in standby mode) all the "MountX" resources get stopped on node 3 and 4 and started again.
> 
> Can anyone help me figure out where and what is the mistake in my configuration as I would like to keep the started resources on active nodes (avoid stop and start of resources)?
> 
> Thank you in advance
> Octavian Ciobanu
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170826/116ceea8/attachment-0002.html>