[Pacemaker] color_instance: Pre-allocation failed

Mon Jan 7 22:36:33 UTC 2013

On Sat, Dec 29, 2012 at 1:21 AM, Stefan Midjich <swehack at gmail.com> wrote:
> Every 15-18 minutes one of my resources gets stopped on one node and then is
> restarted shortly after.
>
> In the DC log I can see the following error lines.
>
> Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
> Pairing resOCFS:1 with groupOcfs2Mgmt:0
> Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Assigning
> app02 to resOCFS:1
> Dec 28 15:04:09 app01 pengine: [8618]: ERROR: color_instance: Pre-allocation
> failed: got app02 instead of app01

Hmm, thats not good.
Some time after the logs below should be a reference to a file ending
with .bz2, can you send that to me please?

> Dec 28 15:04:09 app01 pengine: [8618]: info: native_deallocate: Deallocating
> resOCFS:1 from app02
> Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
> Pairing resOCFS:0 with groupOcfs2Mgmt:0
> Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Assigning
> app02 to resOCFS:0
> Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
> Pairing resOCFS:1 with groupOcfs2Mgmt:1
> Dec 28 15:04:09 app01 pengine: [8618]: debug: clone_rsc_colocation_rh:
> Pairing resOCFS:1 with groupOcfs2Mgmt:1
> Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: All nodes
> for resource resOCFS:1 are unavailable, unclean or shutting down (app01: 1,
> -1000000)
> Dec 28 15:04:09 app01 pengine: [8618]: debug: native_assign_node: Could not
> allocate a node for resOCFS:1
> Dec 28 15:04:09 app01 pengine: [8618]: info: native_color: Resource
> resOCFS:1 cannot run anywhere
>
> This plays out before every stop event of OCFS.
>
> Here is the cib.
>
> primitive VirtualIP0 ocf:heartbeat:IPaddr2 \
>         params ip="10.121.12.30" \
>         op monitor interval="10s" \
>         meta target-role="Started"
> primitive resDLM ocf:pacemaker:controld
> primitive resDrbdShared0 ocf:linbit:drbd \
>         params drbd_resource="shared0" \
>         operations $id="resDrbd-operations" \
>         op monitor interval="20" role="Master" timeout="20" notify="true" \
>         op monitor interval="30" role="Slave" timeout="20" notify="true"
> primitive resJboss lsb:jboss4 \
>         op monitor interval="120s" timeout="150s" \
>         op start interval="0" timeout="150s" \
>         op stop interval="0" timeout="150s"
> primitive resO2CB ocf:pacemaker:o2cb
> primitive resOCFS ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/shared0" directory="/data"
> fstype="ocfs2" \
>         op monitor interval="120s" timeout="40" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="60"
> group groupOcfs2Mgmt resDLM resO2CB
> ms msDrbdShared0 resDrbdShared0 \
>         meta resource-stickines="100" notify="true" interleave="true"
> master-max="2" target-role="Started"
> clone cloneJboss resJboss \
>         meta interleave="true" ordered="true" is-managed="false"
> target-role="Started"
> clone cloneOCFS resOCFS \
>         meta interleave="true" ordered="true" target-role="Started"
> is-managed="true"
> clone cloneOcfs2Mgmt groupOcfs2Mgmt \
>         meta interleave="true" target-role="Started"
> location locVirtualIP0 VirtualIP0 9001: app01
> colocation colDRBD inf: cloneOcfs2Mgmt msDrbdShared0:Master
> colocation colOcfs2 inf: cloneOCFS cloneOcfs2Mgmt
> order ordDRBD inf: msDrbdShared0:promote cloneOcfs2Mgmt:start
> order ordOcfs2 inf: cloneOcfs2Mgmt:start cloneOCFS:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1356702541"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="0"
> op_defaults $id="op-options" \
>         timeout="20s"
>
> I first suspected wrong network name resolution but /etc/hosts is correct
> and no duplicate names.
>
> --
> Hälsningar / Greetings
>
> Stefan Midjich
> [De omnibus dubitandum]
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>