[ClusterLabs] Inconsistent clone $OCF_RESOURCE_INSTANCE value depending on symmetric-cluster property.

Mon Apr 29 11:24:19 EDT 2019

On Sat, 2019-04-27 at 10:27 +0300, Andrei Borzenkov wrote:
> Documentation says for clone resources OCF_RESOURCE_INSTANCE contains
> primitive qualified by instance number, like primitive:1.

That is pacemaker's practice (inherited from heartbeat).

The OCF standard itself says the variable contains "the name of the
resource instance," where the name "must be unique to the particular
resource type" and "is any name chosen by the administrator to identify
the resource instance."

The OCF standard was originally written without clones in mind, so it's
a grey area.

> I was rather surprised that pacemaker may actually omit qualification
> at
> least in the following case:
> 
> 1. *start* pacemaker with symmetric-cluster=false
> 2. do not add constraints allowing primitive in clone definition run
> anywhere
> 3. try to start clone
> 
> Resource agents get simple "primitive" for OCF_RESOURCE_INSTANCE
> instead
> of "primitive:1".

I'm confused -- how is the resource agent being called if there are no
constraints enabling it? Maybe for probes?

> Moreover, if now I set symmetric-cluster=true, pacemaker *continues*
> to
> provide OCF_RESOURCE_INSTANCE without qualification!
> 
> If I *start* pacemaker with symmetric-cluster=true (default)
> pacemaker
> provides qualified OCF_RESOURCE_INSTANCE and *continues* to do so
> even
> after I set symmetric-cluster=false. Until next pacemaker restart.
> 
> node 1: ha1 \
> 	attributes master-m_Stateful=10
> node 2: ha2
> primitive A Dummy \
> 	op start interval=0 \
> 	op_params interval=0
> primitive B Dummy \
> 	op start interval=0 \
> 	op_params interval=0 \
> 	meta target-role=Stopped
> primitive fence_disk stonith:fence_scsi \
> 	params devices="/dev/sdb"
> primitive p_Stateful ocf:_local:Stateful_Test_1 \
> 	op start interval=0
> ms m_Stateful p_Stateful \
> 	meta target-role=Stopped
> location A-ha1 A 50: ha1
> location A-ha2 A 30: ha2
> location B-ha2 B 3: ha2
> colocation B-with-A -inf: B A
> property cib-bootstrap-options: \
> 	dc-version="2.0.1+20190304.9e909a5bd-1.1-
> 2.0.1+20190304.9e909a5bd" \
> 	cluster-infrastructure=corosync \
> 	stonith-enabled=true \
> 	last-lrm-refresh=1551115646 \
> 	have-watchdog=false \
> 	symmetric-cluster=false
> 
> And after trying to start m_Stateful
> 
> OCF_RESOURCE_INSTANCE=Stateful_Test_1
> OCF_RESOURCE_INSTANCE=p_Stateful
> 
> 
> Now delete symmetric-cluster
> 
> 
> node 1: ha1 \
> 	attributes master-m_Stateful=10
> node 2: ha2
> primitive A Dummy \
> 	op start interval=0 \
> 	op_params interval=0
> primitive B Dummy \
> 	op start interval=0 \
> 	op_params interval=0 \
> 	meta target-role=Stopped
> primitive fence_disk stonith:fence_scsi \
> 	params devices="/dev/sdb"
> primitive p_Stateful ocf:_local:Stateful_Test_1 \
> 	op start interval=0
> ms m_Stateful p_Stateful \
> 	meta target-role=Started
> location A-ha1 A 50: ha1
> location A-ha2 A 30: ha2
> location B-ha2 B 3: ha2
> colocation B-with-A -inf: B A
> property cib-bootstrap-options: \
> 	dc-version="2.0.1+20190304.9e909a5bd-1.1-
> 2.0.1+20190304.9e909a5bd" \
> 	cluster-infrastructure=corosync \
> 	stonith-enabled=true \
> 	last-lrm-refresh=1551115646 \
> 	have-watchdog=false
> 
> And try to start m_Stateful again
> 
> ==== meta-data
> OCF_RESOURCE_INSTANCE=Stateful_Test_1
> ==== start
> OCF_RESOURCE_INSTANCE=p_Stateful
> ==== promote
> OCF_RESOURCE_INSTANCE=p_Stateful
> 
> 
> In case I miss something obvious - is it intentional? If no, should I
> open bug report?

I don't think it's intentional. However, the instance number *should*
be irrelevant to the resource agent for anonymous clones. I would
consider it a bug if it's missing for a unique clone, but not if it
only happens for anonymous clones.
-- 
Ken Gaillot <kgaillot at redhat.com>