[ClusterLabs] kind=Optional order constraint not working at startup

Wed Sep 21 14:30:37 UTC 2016

On 09/21/2016 09:00 AM, Auer, Jens wrote:
> Hi,
> 
> could this be issue 5039 (http://bugs.clusterlabs.org/show_bug.cgi?id=5039)? It sounds similar.

Correct -- "Optional" means honor the constraint only if both resources
are starting *in the same transition*.

shared_fs has to wait for the DRBD promotion, but the other resources
have no such limitation, so they are free to start before shared_fs.

The problem is "... only impacts the startup procedure". Pacemaker
doesn't distinguish start-up from any other state of the cluster. Nodes
(and entire partitions of nodes) can come and go at any time, and any or
all resources can be stopped and started again at any time, so
"start-up" is not really as meaningful as it sounds.

Maybe try an optional constraint of the other resources on the DRBD
promotion. That would make it more likely that all the resources end up
starting in the same transition.

> Cheers,
>   Jens
> 
> --
> Jens Auer | CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.auer at cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.
> 
> 
> ________________________________________
> Von: Auer, Jens [jens.auer at cgi.com]
> Gesendet: Mittwoch, 21. September 2016 15:10
> An: users at clusterlabs.org
> Betreff: [ClusterLabs] kind=Optional order constraint not working at startup
> 
> Hi,
> 
> in my cluster setup I have a couple of resources from which I need to start some in specific order. Basically I have two cloned resources that should start after mounting a DRBD filesystem on all nodes plus one resource that start after the clone sets. It is important that this only impacts the startup procedure. Once the system is running stopping or starting one of the clone resources should not impact the other resource's state. >From reading the manual, this should be what a local constraint with kind=Optional implements. However, when I start the cluster the filesystem is started after the otehr resources ignoring the ordering constraint.
> 
> My cluster configuration:
> pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02
> pcs cluster start --all
> sleep 5
> crm_attribute --type nodes --node MDA1PFP-PCS01 --name ServerRole --update PRIME
> crm_attribute --type nodes --node MDA1PFP-PCS02 --name ServerRole --update BACKUP
> pcs property set stonith-enabled=false
> pcs resource defaults resource-stickiness=100
> 
> rm -f mda; pcs cluster cib mda
> pcs -f mda property set no-quorum-policy=ignore
> 
> pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 cidr_netmask=24 nic=bond0 op monitor interval=1s
> pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50
> pcs -f mda resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=pf-pep-dev-1  params timeout=1 attempts=3  op monitor interval=1 --clone
> pcs -f mda constraint location mda-ip rule score=-INFINITY pingd lt 1 or not_defined pingd
> 
> pcs -f mda resource create ACTIVE ocf:heartbeat:dummy
> pcs -f mda constraint colocation add ACTIVE with mda-ip score=INFINITY
> 
> pcs -f mda resource create drbd1 ocf:linbit:drbd drbd_resource=shared_fs op monitor interval=60s
> pcs -f mda resource master drbd1_sync drbd1 master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
> pcs -f mda constraint colocation add master drbd1_sync with mda-ip score=INFINITY
> 
> pcs -f mda resource create shared_fs Filesystem device="/dev/drbd1" directory=/shared_fs fstype="xfs"
> pcs -f mda constraint order promote drbd1_sync then start shared_fs
> pcs -f mda constraint colocation add shared_fs with master drbd1_sync score=INFINITY
> 
> pcs -f mda resource create supervisor ocf:pfpep:supervisor params config="/shared_fs/pfpep.ini" --clone
> pcs -f mda resource create snmpAgent ocf:pfpep:snmpAgent params config="/shared_fs/pfpep.ini" --clone
> pcs -f mda resource create clusterSwitchNotification ocf:pfpep:clusterSwitch params config="/shared_fs/pfpep.ini"
> 
> pcs -f mda constraint order start shared_fs then snmpAgent-clone  kind=Optional
> pcs -f mda constraint order start shared_fs then supervisor-clone kind=Optional
> pcs -f mda constraint order start snmpAgent-clone then supervisor-clone kind=Optional
> pcs -f mda constraint order start supervisor-clone then clusterSwitchNotification kind=Optional
> pcs -f mda constraint colocation add clusterSwitchNotification with shared_fs score=INFINITY
> 
> pcs cluster cib-push mda
> 
> The order of resource startup in the log file is:
> Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation snmpAgent_start_0: ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=82, confirmed=true)
> Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation drbd1_start_0: ok (node=MDA1PFP-PCS01, call=39, rc=0, cib-update=83, confirmed=true)
> Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation ping_start_0: ok (node=MDA1PFP-PCS01, call=38, rc=0, cib-update=85, confirmed=true)
> Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation supervisor_start_0: ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=88, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation ACTIVE_start_0: ok (node=MDA1PFP-PCS01, call=48, rc=0, cib-update=94, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation mda-ip_start_0: ok (node=MDA1PFP-PCS01, call=47, rc=0, cib-update=96, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation clusterSwitchNotification_start_0: ok (node=MDA1PFP-PCS01, call=50, rc=0, cib-update=98, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation shared_fs_start_0: ok (node=MDA1PFP-PCS01, call=57, rc=0, cib-update=101, confirmed=true)
> 
> Why is the shared file system started after the other resources?
> 
> Best wishes,
>   Jens