[ClusterLabs] kind=Optional order constraint not working at startup

Wed Sep 21 13:08:02 EDT 2016

On 09/21/2016 09:51 AM, Auer, Jens wrote:
> Hi,
> 
>> shared_fs has to wait for the DRBD promotion, but the other resources
>> have no such limitation, so they are free to start before shared_fs.
> Isn't there an implicit limitation by the ordering constraint? I have drbd_promote < shared_fs < snmpAgent-clone,
> and I would expect this to be a transitive relationship.

Yes, but shared fs < snmpAgent-Clone is optional, so snmpAgent-Clone is
free to start without it.

At step 1, the cluster can start dbrd. At step 2, it can promote drbd.
At step 3, it can start shared_fs. Since the constraint is optional,
snmpAgent-clone can start at step 1, because nothing is preventing it
from doing so.

>> The problem is "... only impacts the startup procedure". Pacemaker
>> doesn't distinguish start-up from any other state of the cluster. Nodes
>> (and entire partitions of nodes) can come and go at any time, and any or
>> all resources can be stopped and started again at any time, so
>> "start-up" is not really as meaningful as it sounds.
>> Maybe try an optional constraint of the other resources on the DRBD
>> promotion. That would make it more likely that all the resources end up
>> starting in the same transition.
> 
> What is the meaning of "transition"? Is there any way I can force resource actions into transitions?

A transition is simply the cluster's response to the current cluster
state, as directed by the configuration. The easiest way to think of it
is as the "steps" as described above.

If the configuration says a service should be running, but the service
is not currently running, then the cluster will schedule a start action
(if possible considering constraints, etc.). All such actions that may
be scheduled together at one time is a "transition".

You can't really control transitions; you can only control the
configuration, and transitions result from configuration+state.

The only way to force actions to take place in a certain order is to use
mandatory constraints.

The problem here is that you want the constraint to be mandatory only at
"start-up". But there really is no such thing. Consider the case where
the cluster stays up, and for whatever maintenance purpose, you stop all
the resources, then start them again later. Is that the same as start-up
or not? What if you restart all but one resource?

I can imagine one possible (but convoluted) way to do something like
this, using node attributes and rules:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140521751827232

With a rule, you can specify a location constraint that applies, not to
a particular node, but to any node with a particular value of a
particular node attribute.

You would need a custom resource agent that sets a node attribute. Let's
say it takes three parameters, the node attribute name, the value to set
when starting (or do nothing), and the value to set when stopping (or do
nothing). (That might actually be a good idea for a new ocf:pacemaker:
agent.)

You'd have an instance of this resource grouped with shared-fs, that
would set the attribute to some magic value when started (say, "1").
You'd have another instance grouped with snmpAgent-clone that would set
it differently when stopped ("0"). Then, you'd have a location
constraint for snmpAgent-clone with a rule that says it is only allowed
on nodes with the attribute set to "1".

With that, snmpAgent-clone would be unable to start until shared-fs had
started at least once. shared-fs could stop without affecting
snmpAgent-clone. If snmpAgent-clone stopped, it would reset, so it would
require shared-fs again.

I haven't thought through all possible scenarios, but I think it would
give the behavior you want.

> I tried to group them but this doesn't work with cloned resources, and an ordered set
> seems to use mandatory constraints and thus is not what I need.

You can clone a group -- so instead of making a group of clones, make a
group of the corresponding primitives, then clone the group. But
grouping enforces mandatory colocation and ordering.

> 
> I've added ordering constraints:
> MDA1PFP-S01 14:46:42 3432 127 ~ # pcs constraint show --full
> Location Constraints:
>   Resource: mda-ip
>     Enabled on: MDA1PFP-PCS01 (score:50) (id:location-mda-ip-MDA1PFP-PCS01-50)
>     Constraint: location-mda-ip
>       Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
>         Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
>         Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1)
> Ordering Constraints:
>   promote drbd1_sync then start shared_fs (kind:Mandatory) (id:order-drbd1_sync-shared_fs-mandatory)
>   start shared_fs then start snmpAgent-clone (kind:Optional) (id:order-shared_fs-snmpAgent-clone-Optional)
>   start shared_fs then start supervisor-clone (kind:Optional) (id:order-shared_fs-supervisor-clone-Optional)
>   start shared_fs then start clusterSwitchNotification (kind:Mandatory) (id:order-shared_fs-clusterSwitchNotification-mandatory)
>   start snmpAgent-clone then start supervisor-clone (kind:Optional) (id:order-snmpAgent-clone-supervisor-clone-Optional)
>   start supervisor-clone then start clusterSwitchNotification (kind:Optional) (id:order-supervisor-clone-clusterSwitchNotification-Optional)
>   promote drbd1_sync then start supervisor-clone (kind:Optional) (id:order-drbd1_sync-supervisor-clone-Optional)
>   promote drbd1_sync then start clusterSwitchNotification (kind:Optional) (id:order-drbd1_sync-clusterSwitchNotification-Optional)
>   promote drbd1_sync then start snmpAgent-clone (kind:Optional) (id:order-drbd1_sync-snmpAgent-clone-Optional)
> Colocation Constraints:
>   ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
>   drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) (with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
>   shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)
>   clusterSwitchNotification with shared_fs (score:INFINITY) (id:colocation-clusterSwitchNotification-shared_fs-INFINITY)
> 
> but it still starts in the wrong order:
> Sep 21 14:45:59 MDA1PFP-S01 crmd[3635]:  notice: Operation snmpAgent_start_0: ok (node=MDA1PFP-PCS01, call=39, rc=0, cib-update=45, confirmed=true)
> Sep 21 14:45:59 MDA1PFP-S01 crmd[3635]:  notice: Operation drbd1_start_0: ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=46, confirmed=true)
> Sep 21 14:46:01 MDA1PFP-S01 crmd[3635]:  notice: Operation ping_start_0: ok (node=MDA1PFP-PCS01, call=38, rc=0, cib-update=48, confirmed=true)
> Sep 21 14:46:01 MDA1PFP-S01 crmd[3635]:  notice: Operation supervisor_start_0: ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=51, confirmed=true)
> Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation ACTIVE_start_0: ok (node=MDA1PFP-PCS01, call=48, rc=0, cib-update=57, confirmed=true)
> Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation mda-ip_start_0: ok (node=MDA1PFP-PCS01, call=47, rc=0, cib-update=59, confirmed=true)
> Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation shared_fs_start_0: ok (node=MDA1PFP-PCS01, call=55, rc=0, cib-update=62, confirmed=true)
> Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation clusterSwitchNotification_start_0: ok (node=MDA1PFP-PCS01, call=57, rc=0, cib-update=64, confirmed=true)
> 
> Best wishes,
>   Jens
> 
> --
> Jens Auer | CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.auer at cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.
> 
> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply e-mail.
> 
> ________________________________________
> Von: Ken Gaillot [kgaillot at redhat.com]
> Gesendet: Mittwoch, 21. September 2016 16:30
> An: users at clusterlabs.org
> Betreff: Re: [ClusterLabs] kind=Optional order constraint not working at startup
> 
> On 09/21/2016 09:00 AM, Auer, Jens wrote:
>> Hi,
>>
>> could this be issue 5039 (http://bugs.clusterlabs.org/show_bug.cgi?id=5039)? It sounds similar.
> 
> Correct -- "Optional" means honor the constraint only if both resources
> are starting *in the same transition*.
> 
> shared_fs has to wait for the DRBD promotion, but the other resources
> have no such limitation, so they are free to start before shared_fs.
> 
> The problem is "... only impacts the startup procedure". Pacemaker
> doesn't distinguish start-up from any other state of the cluster. Nodes
> (and entire partitions of nodes) can come and go at any time, and any or
> all resources can be stopped and started again at any time, so
> "start-up" is not really as meaningful as it sounds.
> 
> Maybe try an optional constraint of the other resources on the DRBD
> promotion. That would make it more likely that all the resources end up
> starting in the same transition.
> 
>> Cheers,
>>   Jens
>>
>> --
>> Jens Auer | CGI | Software-Engineer
>> CGI (Germany) GmbH & Co. KG
>> Rheinstraße 95 | 64295 Darmstadt | Germany
>> T: +49 6151 36860 154
>> jens.auer at cgi.com
>> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.
>>
>>
>> ________________________________________
>> Von: Auer, Jens [jens.auer at cgi.com]
>> Gesendet: Mittwoch, 21. September 2016 15:10
>> An: users at clusterlabs.org
>> Betreff: [ClusterLabs] kind=Optional order constraint not working at startup
>>
>> Hi,
>>
>> in my cluster setup I have a couple of resources from which I need to start some in specific order. Basically I have two cloned resources that should start after mounting a DRBD filesystem on all nodes plus one resource that start after the clone sets. It is important that this only impacts the startup procedure. Once the system is running stopping or starting one of the clone resources should not impact the other resource's state. From reading the manual, this should be what a local constraint with kind=Optional implements. However, when I start the cluster the filesystem is started after the otehr resources ignoring the ordering constraint.
>>
>> My cluster configuration:
>> pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02
>> pcs cluster start --all
>> sleep 5
>> crm_attribute --type nodes --node MDA1PFP-PCS01 --name ServerRole --update PRIME
>> crm_attribute --type nodes --node MDA1PFP-PCS02 --name ServerRole --update BACKUP
>> pcs property set stonith-enabled=false
>> pcs resource defaults resource-stickiness=100
>>
>> rm -f mda; pcs cluster cib mda
>> pcs -f mda property set no-quorum-policy=ignore
>>
>> pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 cidr_netmask=24 nic=bond0 op monitor interval=1s
>> pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50
>> pcs -f mda resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=pf-pep-dev-1  params timeout=1 attempts=3  op monitor interval=1 --clone
>> pcs -f mda constraint location mda-ip rule score=-INFINITY pingd lt 1 or not_defined pingd
>>
>> pcs -f mda resource create ACTIVE ocf:heartbeat:dummy
>> pcs -f mda constraint colocation add ACTIVE with mda-ip score=INFINITY
>>
>> pcs -f mda resource create drbd1 ocf:linbit:drbd drbd_resource=shared_fs op monitor interval=60s
>> pcs -f mda resource master drbd1_sync drbd1 master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>> pcs -f mda constraint colocation add master drbd1_sync with mda-ip score=INFINITY
>>
>> pcs -f mda resource create shared_fs Filesystem device="/dev/drbd1" directory=/shared_fs fstype="xfs"
>> pcs -f mda constraint order promote drbd1_sync then start shared_fs
>> pcs -f mda constraint colocation add shared_fs with master drbd1_sync score=INFINITY
>>
>> pcs -f mda resource create supervisor ocf:pfpep:supervisor params config="/shared_fs/pfpep.ini" --clone
>> pcs -f mda resource create snmpAgent ocf:pfpep:snmpAgent params config="/shared_fs/pfpep.ini" --clone
>> pcs -f mda resource create clusterSwitchNotification ocf:pfpep:clusterSwitch params config="/shared_fs/pfpep.ini"
>>
>> pcs -f mda constraint order start shared_fs then snmpAgent-clone  kind=Optional
>> pcs -f mda constraint order start shared_fs then supervisor-clone kind=Optional
>> pcs -f mda constraint order start snmpAgent-clone then supervisor-clone kind=Optional
>> pcs -f mda constraint order start supervisor-clone then clusterSwitchNotification kind=Optional
>> pcs -f mda constraint colocation add clusterSwitchNotification with shared_fs score=INFINITY
>>
>> pcs cluster cib-push mda
>>
>> The order of resource startup in the log file is:
>> Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation snmpAgent_start_0: ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=82, confirmed=true)
>> Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation drbd1_start_0: ok (node=MDA1PFP-PCS01, call=39, rc=0, cib-update=83, confirmed=true)
>> Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation ping_start_0: ok (node=MDA1PFP-PCS01, call=38, rc=0, cib-update=85, confirmed=true)
>> Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation supervisor_start_0: ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=88, confirmed=true)
>> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation ACTIVE_start_0: ok (node=MDA1PFP-PCS01, call=48, rc=0, cib-update=94, confirmed=true)
>> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation mda-ip_start_0: ok (node=MDA1PFP-PCS01, call=47, rc=0, cib-update=96, confirmed=true)
>> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation clusterSwitchNotification_start_0: ok (node=MDA1PFP-PCS01, call=50, rc=0, cib-update=98, confirmed=true)
>> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation shared_fs_start_0: ok (node=MDA1PFP-PCS01, call=57, rc=0, cib-update=101, confirmed=true)
>>
>> Why is the shared file system started after the other resources?
>>
>> Best wishes,
>>   Jens