[Pacemaker] Colocating with unmanaged resource

Fri Dec 19 14:21:49 EST 2014

Hi,

Simple scenario, several floating IPs should be living on "front" nodes
only if there is working Nginx. There are several reasons against Nginx
being controlled by Pacemaker.

So, decided to colocate FIPs with unmanaged Nginx. This worked fine in
1.1.6 with some exceptions.

Later, on other cluster I decided to switch to 1.1.10 and corosync 2
because of performance improvements. Now also testing 1.1.12.

It seems I can't reliably colocate FIPs with unmanaged Nginx on 1.1.10
and 10.1.12.

Here are behaviors of different versions of pacemaker:

1.1.6, 1.1.10, 1.1.12:

- if Nginx has started on a node after initial probe for Nginx clone
then pacemaker will never see it running until cleanup or other probe
trigger

1.1.6:

- stopping nginx on a node makes the clone instance FAIL for that node,
FIP moves away from that node. This is as expected
- starting nginx removes FAIL state and FIP moves back. This is as
expected

1.1.10:

- stopping nginx on a node:
  - usually makes the clone instance to FAIL for that node, but 
    FIP stays running on that node regardless of INF colocation
  - sometime makes the clone instance to FAIL for that node and
    immediately after that clone instance returns to STARTED state,
    FIP stays running on that node
  - sometimes makes the clone instance to be STOPPED for that node,
    FIP moves away from that node. This is as expected
- starting nginx:
  - if was FAIL: removes FAIL state: FIP remains running
  - if was STARTED:
    - usually nothing happens: FIP remains running
    - sometimes makes clone instance to FAIL for that node, but 
      FIP stays running on that node regardless of INF colocation
  - if was STOPPED: moves FIP back. This is as expected

1.1.12:

- stopping nginx on a node always makes the clone instance to FAIL for
that node, but FIP stays running on that node regardless of INF
colocation
- starting nginx removes FAIL state, FIP remains running

Please comment on this. And some questions:

- are unmanaged resources designed to be used in normal conditions for
other resources to be colocated with them? How to cook them right?
- is there a some kind of "recurring probe" to "see" unmanaged resources
that have started after initial probe?

Let me know if more logs needed, right now can't collect logs for all
cases, some attached.

Config for 1.1.10 (similar configs for 1.1.6 and 1.1.12):

node $id="..." pcmk10-1 \
        attributes onhv="1" front="true"
node $id="..." pcmk10-2 \
        attributes onhv="2" front="true"
node $id="..." pcmk10-3 \
        attributes onhv="3" front="true"

primitive FIP_1 ocf:heartbeat:IPaddr2 \
        op monitor interval="2s" \
        params ip="10.1.1.1" cidr_netmask="16" \
        meta migration-threshold="2" failure-timeout="60s"
primitive FIP_2 ocf:heartbeat:IPaddr2 \
        op monitor interval="2s" \
        params ip="10.1.2.1" cidr_netmask="16" \
        meta migration-threshold="2" failure-timeout="60s"
primitive FIP_3 ocf:heartbeat:IPaddr2 \
        op monitor interval="2s" \
        params ip="10.1.3.1" cidr_netmask="16" \
        meta migration-threshold="2" failure-timeout="60s"

primitive Nginx lsb:nginx \
        op start interval="0" enabled="false" \
        op stop interval="0" enabled="false" \
        op monitor interval="2s"

clone cl_Nginx Nginx \
        meta globally-unique="false" notify="false" is-managed="false"

location loc-cl_Nginx cl_Nginx \
        rule $id="loc-cl_Nginx-r1" 500: front eq true

location loc-FIP_1 FIP_1 \
        rule $id="loc-FIP_1-r1" 500: onhv eq 1 and front eq true \
        rule $id="loc-FIP_1-r2" 200: defined onhv and onhv ne 1 and
front eq true
location loc-FIP_2 FIP_2 \
        rule $id="loc-FIP_2-r1" 500: onhv eq 2 and front eq true \
        rule $id="loc-FIP_2-r2" 200: defined onhv and onhv ne 2 and
front eq true
location loc-FIP_3 FIP_3 \
        rule $id="loc-FIP_3-r1" 500: onhv eq 3 and front eq true \
        rule $id="loc-FIP_3-r2" 200: defined onhv and onhv ne 3 and
front eq true

colocation coloc-FIP_1-cl_Nginx inf: FIP_1 cl_Nginx
colocation coloc-FIP_2-cl_Nginx inf: FIP_2 cl_Nginx
colocation coloc-FIP_3-cl_Nginx inf: FIP_3 cl_Nginx

property $id="cib-bootstrap-options" \
        dc-version="1.1.10-42f2063" \
        cluster-infrastructure="corosync" \
        symmetric-cluster="false" \
        stonith-enabled="false" \
        no-quorum-policy="stop" \
        cluster-recheck-interval="10s" \
        maintenance-mode="false" \
        last-lrm-refresh="1418998945"
rsc_defaults $id="rsc-options" \
        resource-stickiness="30"
op_defaults $id="op_defaults-options" \
        record-pending="false"

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.1.10_fail-started.log
Type: text/x-log
Size: 21100 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141219/5b809feb/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.1.10_stopped-started.log
Type: text/x-log
Size: 14950 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20141219/5b809feb/attachment-0005.bin>