[ClusterLabs] ClusterIP location constraint reappears after reboot

Thu Feb 18 19:37:31 UTC 2016

On 02/18/2016 01:07 PM, Jeremy Matthews wrote:
> Hi,
> 
> We're having an issue with our cluster where after a reboot of our system a location constraint reappears for the ClusterIP. This causes a problem, because we have a daemon that checks the cluster state and waits until the ClusterIP is started before it kicks off our application. We didn't have this issue when using an earlier version of pacemaker. Here is the constraint as shown by pcs:
> 
> [root at g5se-f3efce cib]# pcs constraint
> Location Constraints:
>   Resource: ClusterIP
>     Disabled on: g5se-f3efce (role: Started)
> Ordering Constraints:
> Colocation Constraints:
> 
> ...and here is our cluster status with the ClusterIP being Stopped:
> 
> [root at g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:36:01 2016
> Last change: Thu Feb 18 10:48:33 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
> ClusterIP      (ocf::heartbeat:IPaddr2):       Stopped
> 
> 
> The cluster really just has one node at this time.
> 
> I retrieve the constraint ID, remove the constraint, verify that ClusterIP is started,  and then reboot:
> 
> [root at g5se-f3efce cib]# pcs constraint ref ClusterIP
> Resource: ClusterIP
>   cli-ban-ClusterIP-on-g5se-f3efce
> [root at g5se-f3efce cib]# pcs constraint remove cli-ban-ClusterIP-on-g5se-f3efce
> 
> [root at g5se-f3efce cib]# pcs status
> Cluster name: cl-g5se-f3efce
> Last updated: Thu Feb 18 11:45:09 2016
> Last change: Thu Feb 18 11:44:53 2016 via crm_resource on g5se-f3efce
> Stack: cman
> Current DC: g5se-f3efce - partition with quorum
> Version: 1.1.11-97629de
> 1 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ g5se-f3efce ]
> 
> Full list of resources:
> 
> sw-ready-g5se-f3efce   (ocf::pacemaker:GBmon): Started g5se-f3efce
> meta-data      (ocf::pacemaker:GBmon): Started g5se-f3efce
> netmon (ocf::heartbeat:ethmonitor):    Started g5se-f3efce
> ClusterIP      (ocf::heartbeat:IPaddr2):       Started g5se-f3efce
> 
> 
> [root at g5se-f3efce cib]# reboot
> 
> ....after reboot, log in, and the constraint is back and ClusterIP has not started.
> 
> 
> I have noticed in /var/lib/pacemaker/cib that the cib-x.raw files get created when there are changes to the cib (cib.xml). After a reboot, I see the constraint being added in a diff between .raw files:
> 
> [root at g5se-f3efce cib]# diff cib-7.raw cib-8.raw
> 1c1
> < <cib epoch="239" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:44:53 2016" update-origin="g5se-f3efce" update-client="crm_resource" crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> ---
>> <cib epoch="240" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Thu Feb 18 11:46:49 2016" update-origin="g5se-f3efce" update-client="crm_resource" crm_feature_set="3.0.9" have-quorum="1" dc-uuid="g5se-f3efce">
> 50c50,52
> <     <constraints/>
> ---
>>     <constraints>
>>       <rsc_location id="cli-ban-ClusterIP-on-g5se-f3efce" rsc="ClusterIP" role="Started" node="g5se-f3efce" score="-INFINITY"/>
>>     </constraints>
> 
> 
> I have also looked in /var/log/cluster/corosync.log and seen logs where it seems the cib is getting updated. I'm not sure if the constraint is being put back in at shutdown or at start up. I just don't understand why it's being put back in. I don't think our daemon code or other scripts are doing this,  but it is something I could verify.

I would look at any scripts running around that time first. Constraints
that start with "cli-" were created by one of the CLI tools, so
something must be calling it. The most likely candidates are pcs
resource move/ban or crm_resource -M/--move/-B/--ban.

> ********************************
> 
> From "yum info pacemaker", my current version is:
> 
> Name        : pacemaker
> Arch        : x86_64
> Version     : 1.1.12
> Release     : 8.el6_7.2
> 
> My earlier version was:
> 
> Name        : pacemaker
> Arch        : x86_64
> Version     : 1.1.10
> Release     : 1.el6_4.4
> 
> I'm still using an earlier version pcs, because the new one seems to have issues with python:
> 
> Name        : pcs
> Arch        : noarch
> Version     : 0.9.90
> Release     : 1.0.1.el6.centos
> 
> *******************************
> 
> If anyone has ideas on the cause or thoughts on this, anything would be greatly appreciated.
> 
> Thanks!
> 
> 
> 
> Jeremy Matthews