[ClusterLabs] kind=Optional order constraint not working at startup

Wed Sep 21 14:51:30 UTC 2016

Hi,

> shared_fs has to wait for the DRBD promotion, but the other resources
> have no such limitation, so they are free to start before shared_fs.
Isn't there an implicit limitation by the ordering constraint? I have drbd_promote < shared_fs < snmpAgent-clone,
and I would expect this to be a transitive relationship.

> The problem is "... only impacts the startup procedure". Pacemaker
> doesn't distinguish start-up from any other state of the cluster. Nodes
> (and entire partitions of nodes) can come and go at any time, and any or
> all resources can be stopped and started again at any time, so
> "start-up" is not really as meaningful as it sounds.
> Maybe try an optional constraint of the other resources on the DRBD
> promotion. That would make it more likely that all the resources end up
> starting in the same transition.

What is the meaning of "transition"? Is there any way I can force resource actions into transitions?
I tried to group them but this doesn't work with cloned resources, and an ordered set
seems to use mandatory constraints and thus is not what I need.

I've added ordering constraints:
MDA1PFP-S01 14:46:42 3432 127 ~ # pcs constraint show --full
Location Constraints:
  Resource: mda-ip
    Enabled on: MDA1PFP-PCS01 (score:50) (id:location-mda-ip-MDA1PFP-PCS01-50)
    Constraint: location-mda-ip
      Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
        Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
        Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1)
Ordering Constraints:
  promote drbd1_sync then start shared_fs (kind:Mandatory) (id:order-drbd1_sync-shared_fs-mandatory)
  start shared_fs then start snmpAgent-clone (kind:Optional) (id:order-shared_fs-snmpAgent-clone-Optional)
  start shared_fs then start supervisor-clone (kind:Optional) (id:order-shared_fs-supervisor-clone-Optional)
  start shared_fs then start clusterSwitchNotification (kind:Mandatory) (id:order-shared_fs-clusterSwitchNotification-mandatory)
  start snmpAgent-clone then start supervisor-clone (kind:Optional) (id:order-snmpAgent-clone-supervisor-clone-Optional)
  start supervisor-clone then start clusterSwitchNotification (kind:Optional) (id:order-supervisor-clone-clusterSwitchNotification-Optional)
  promote drbd1_sync then start supervisor-clone (kind:Optional) (id:order-drbd1_sync-supervisor-clone-Optional)
  promote drbd1_sync then start clusterSwitchNotification (kind:Optional) (id:order-drbd1_sync-clusterSwitchNotification-Optional)
  promote drbd1_sync then start snmpAgent-clone (kind:Optional) (id:order-drbd1_sync-snmpAgent-clone-Optional)
Colocation Constraints:
  ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
  drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) (with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
  shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)
  clusterSwitchNotification with shared_fs (score:INFINITY) (id:colocation-clusterSwitchNotification-shared_fs-INFINITY)

but it still starts in the wrong order:
Sep 21 14:45:59 MDA1PFP-S01 crmd[3635]:  notice: Operation snmpAgent_start_0: ok (node=MDA1PFP-PCS01, call=39, rc=0, cib-update=45, confirmed=true)
Sep 21 14:45:59 MDA1PFP-S01 crmd[3635]:  notice: Operation drbd1_start_0: ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=46, confirmed=true)
Sep 21 14:46:01 MDA1PFP-S01 crmd[3635]:  notice: Operation ping_start_0: ok (node=MDA1PFP-PCS01, call=38, rc=0, cib-update=48, confirmed=true)
Sep 21 14:46:01 MDA1PFP-S01 crmd[3635]:  notice: Operation supervisor_start_0: ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=51, confirmed=true)
Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation ACTIVE_start_0: ok (node=MDA1PFP-PCS01, call=48, rc=0, cib-update=57, confirmed=true)
Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation mda-ip_start_0: ok (node=MDA1PFP-PCS01, call=47, rc=0, cib-update=59, confirmed=true)
Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation shared_fs_start_0: ok (node=MDA1PFP-PCS01, call=55, rc=0, cib-update=62, confirmed=true)
Sep 21 14:46:06 MDA1PFP-S01 crmd[3635]:  notice: Operation clusterSwitchNotification_start_0: ok (node=MDA1PFP-PCS01, call=57, rc=0, cib-update=64, confirmed=true)

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.auer at cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply e-mail.

________________________________________
Von: Ken Gaillot [kgaillot at redhat.com]
Gesendet: Mittwoch, 21. September 2016 16:30
An: users at clusterlabs.org
Betreff: Re: [ClusterLabs] kind=Optional order constraint not working at startup

On 09/21/2016 09:00 AM, Auer, Jens wrote:
> Hi,
>
> could this be issue 5039 (http://bugs.clusterlabs.org/show_bug.cgi?id=5039)? It sounds similar.

Correct -- "Optional" means honor the constraint only if both resources
are starting *in the same transition*.

shared_fs has to wait for the DRBD promotion, but the other resources
have no such limitation, so they are free to start before shared_fs.

The problem is "... only impacts the startup procedure". Pacemaker
doesn't distinguish start-up from any other state of the cluster. Nodes
(and entire partitions of nodes) can come and go at any time, and any or
all resources can be stopped and started again at any time, so
"start-up" is not really as meaningful as it sounds.

Maybe try an optional constraint of the other resources on the DRBD
promotion. That would make it more likely that all the resources end up
starting in the same transition.

> Cheers,
>   Jens
>
> --
> Jens Auer | CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.auer at cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.
>
>
> ________________________________________
> Von: Auer, Jens [jens.auer at cgi.com]
> Gesendet: Mittwoch, 21. September 2016 15:10
> An: users at clusterlabs.org
> Betreff: [ClusterLabs] kind=Optional order constraint not working at startup
>
> Hi,
>
> in my cluster setup I have a couple of resources from which I need to start some in specific order. Basically I have two cloned resources that should start after mounting a DRBD filesystem on all nodes plus one resource that start after the clone sets. It is important that this only impacts the startup procedure. Once the system is running stopping or starting one of the clone resources should not impact the other resource's state. From reading the manual, this should be what a local constraint with kind=Optional implements. However, when I start the cluster the filesystem is started after the otehr resources ignoring the ordering constraint.
>
> My cluster configuration:
> pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02
> pcs cluster start --all
> sleep 5
> crm_attribute --type nodes --node MDA1PFP-PCS01 --name ServerRole --update PRIME
> crm_attribute --type nodes --node MDA1PFP-PCS02 --name ServerRole --update BACKUP
> pcs property set stonith-enabled=false
> pcs resource defaults resource-stickiness=100
>
> rm -f mda; pcs cluster cib mda
> pcs -f mda property set no-quorum-policy=ignore
>
> pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 cidr_netmask=24 nic=bond0 op monitor interval=1s
> pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50
> pcs -f mda resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=pf-pep-dev-1  params timeout=1 attempts=3  op monitor interval=1 --clone
> pcs -f mda constraint location mda-ip rule score=-INFINITY pingd lt 1 or not_defined pingd
>
> pcs -f mda resource create ACTIVE ocf:heartbeat:dummy
> pcs -f mda constraint colocation add ACTIVE with mda-ip score=INFINITY
>
> pcs -f mda resource create drbd1 ocf:linbit:drbd drbd_resource=shared_fs op monitor interval=60s
> pcs -f mda resource master drbd1_sync drbd1 master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
> pcs -f mda constraint colocation add master drbd1_sync with mda-ip score=INFINITY
>
> pcs -f mda resource create shared_fs Filesystem device="/dev/drbd1" directory=/shared_fs fstype="xfs"
> pcs -f mda constraint order promote drbd1_sync then start shared_fs
> pcs -f mda constraint colocation add shared_fs with master drbd1_sync score=INFINITY
>
> pcs -f mda resource create supervisor ocf:pfpep:supervisor params config="/shared_fs/pfpep.ini" --clone
> pcs -f mda resource create snmpAgent ocf:pfpep:snmpAgent params config="/shared_fs/pfpep.ini" --clone
> pcs -f mda resource create clusterSwitchNotification ocf:pfpep:clusterSwitch params config="/shared_fs/pfpep.ini"
>
> pcs -f mda constraint order start shared_fs then snmpAgent-clone  kind=Optional
> pcs -f mda constraint order start shared_fs then supervisor-clone kind=Optional
> pcs -f mda constraint order start snmpAgent-clone then supervisor-clone kind=Optional
> pcs -f mda constraint order start supervisor-clone then clusterSwitchNotification kind=Optional
> pcs -f mda constraint colocation add clusterSwitchNotification with shared_fs score=INFINITY
>
> pcs cluster cib-push mda
>
> The order of resource startup in the log file is:
> Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation snmpAgent_start_0: ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=82, confirmed=true)
> Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation drbd1_start_0: ok (node=MDA1PFP-PCS01, call=39, rc=0, cib-update=83, confirmed=true)
> Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation ping_start_0: ok (node=MDA1PFP-PCS01, call=38, rc=0, cib-update=85, confirmed=true)
> Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation supervisor_start_0: ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=88, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation ACTIVE_start_0: ok (node=MDA1PFP-PCS01, call=48, rc=0, cib-update=94, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation mda-ip_start_0: ok (node=MDA1PFP-PCS01, call=47, rc=0, cib-update=96, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation clusterSwitchNotification_start_0: ok (node=MDA1PFP-PCS01, call=50, rc=0, cib-update=98, confirmed=true)
> Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation shared_fs_start_0: ok (node=MDA1PFP-PCS01, call=57, rc=0, cib-update=101, confirmed=true)
>
> Why is the shared file system started after the other resources?
>
> Best wishes,
>   Jens

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org