[ClusterLabs] MS Promotion Not Working

Brian D. Lees brian at fides.me.uk
Sun Jun 14 08:26:16 EDT 2015


Success!  I have managed to get this to work and I know 'how' but not 'why'.  The promotion scores seem to be -1 by default. I set a location preference of 1 for master on each node and now it works. When one node is rebooted it moves to the other successfully. So I understand how it works (the location preference changes the promotion scores to 0) but I don’t understand why the default should be -1.  If you configure a master/slave resource you surely want it to be master somewhere?  Anyway here is the configuration etc and Takehiro asked my distro which is Suse 13.2.  Thanks for all your help.

node 1084751972: ACL001 \
        attributes standby=off
node 1084752072: ACL002 \
        attributes standby=off
primitive res_drbd_1 ocf:linbit:drbd \
        params drbd_resource=acl_shared \
        operations $id=res_drbd_1-operations \
        op start interval=0 timeout=240 \
        op promote interval=0 timeout=90 \
        op demote interval=0 timeout=90 \
        op stop interval=0 timeout=100 \
        op monitor interval=10 timeout=20 role=Master start-delay=0 \
        op monitor interval=11 timeout=20 role=Slave start-delay=0 \
        op notify interval=0 timeout=90 \
        meta
ms ms_drbd_1 res_drbd_1 \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=master
location drbd_primary_1 ms_drbd_1 role=Master 1: ACL001
location drbd_primary_2 ms_drbd_1 role=Master 1: ACL002
property cib-bootstrap-options: \
        symmetric-cluster=true \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
        cluster-infrastructure=corosync \
        cluster-name=aclcluster
rsc_defaults rsc-options: \
        resource-stickiness=0

Current cluster status:
Online: [ ACL001 ACL002 ]

 Master/Slave Set: ms_drbd_1 [res_drbd_1]
     Masters: [ ACL002 ]
     Slaves: [ ACL001 ]

Allocation scores:
clone_color: ms_drbd_1 allocation score on ACL001: 0
clone_color: ms_drbd_1 allocation score on ACL002: 0
clone_color: res_drbd_1:0 allocation score on ACL001: 0
clone_color: res_drbd_1:0 allocation score on ACL002: 0
clone_color: res_drbd_1:1 allocation score on ACL001: 0
clone_color: res_drbd_1:1 allocation score on ACL002: 0
native_color: res_drbd_1:0 allocation score on ACL001: 0
native_color: res_drbd_1:0 allocation score on ACL002: 0
native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
native_color: res_drbd_1:1 allocation score on ACL002: 0
res_drbd_1:1 promotion score on ACL002: 0
res_drbd_1:0 promotion score on ACL001: 0

Transition Summary:

-----Original Message-----
From: Andrei Borzenkov [mailto:arvidjaar at gmail.com] 
Sent: 14 June 2015 13:27
To: Brian D. Lees
Cc: 'Takehiro Matsushima'; 'Cluster Labs - All topics related to open-source clustering welcomed'
Subject: Re: [ClusterLabs] MS Promotion Not Working

В Sun, 14 Jun 2015 13:16:18 +0200
"Brian D. Lees" <brian at fides.me.uk> пишет:

> Andrei,
> 
> Thanks for the suggestion;  however it sadly has the same outcome!  I think the key to this is understanding how the promotion scores are calculated as that will point us towards the items which are making the scores both -1.  Do you have any idea how this works?
> 

As far as I know there is no default master score. Resource agent is responsible for deciding which instance should (can) be promoted and setting scores accordingly. 

> node 1084751972: ACL001 \
>         attributes standby=off
> node 1084752072: ACL002 \
>         attributes standby=off
> primitive res_drbd_1 ocf:linbit:drbd \
>         params drbd_resource=acl_shared \
>         operations $id=res_drbd_1-operations \
>         op start interval=0 timeout=240 \
>         op promote interval=0 timeout=90 \
>         op demote interval=0 timeout=90 \
>         op stop interval=0 timeout=100 \
>         op monitor interval=10 timeout=20 role=Master start-delay=0 \
>         op monitor interval=11 timeout=20 role=Slave start-delay=0 \
>         op notify interval=0 timeout=90 \
>         meta
> ms ms_drbd_1 res_drbd_1 \
>         meta master-max=1 master-node-max=1 clone-max=2 
> clone-node-max=1 notify=true target-role=master property cib-bootstrap-options: \
>         symmetric-cluster=true \
>         stonith-enabled=false \
>         no-quorum-policy=ignore \
>         dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
>         cluster-infrastructure=corosync \
>         cluster-name=aclcluster
> rsc_defaults rsc-options: \
>         resource-stickiness=0
> 
> Current cluster status:
> Online: [ ACL001 ACL002 ]
> 
>  Master/Slave Set: ms_drbd_1 [res_drbd_1]
>      Slaves: [ ACL001 ACL002 ]
> 
> Allocation scores:
> clone_color: ms_drbd_1 allocation score on ACL001: 0
> clone_color: ms_drbd_1 allocation score on ACL002: 0
> clone_color: res_drbd_1:0 allocation score on ACL001: 0
> clone_color: res_drbd_1:0 allocation score on ACL002: 0
> clone_color: res_drbd_1:1 allocation score on ACL001: 0
> clone_color: res_drbd_1:1 allocation score on ACL002: 0
> native_color: res_drbd_1:0 allocation score on ACL001: 0
> native_color: res_drbd_1:0 allocation score on ACL002: 0
> native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
> native_color: res_drbd_1:1 allocation score on ACL002: 0
> res_drbd_1:0 promotion score on ACL001: -1
> res_drbd_1:1 promotion score on ACL002: -1
> 
> Transition Summary:
> 
> 
>    debug: qb_rb_open_2:         shm size:524301; real_size:528384; rb->word_size:132096
>    debug: qb_rb_open_2:         shm size:524301; real_size:528384; rb->word_size:132096
>    debug: qb_rb_open_2:         shm size:524301; real_size:528384; rb->word_size:132096
>    debug: cib_native_signon_raw:        Connection to CIB successful
>    debug: cib_native_signoff:   Signing out of the CIB Service
>    debug: qb_ipcc_disconnect:   qb_ipcc_disconnect()
>    debug: qb_rb_close:  Closing ringbuffer: /dev/shm/qb-cib_rw-request-1929-14545-13-header
>    debug: qb_rb_close:  Closing ringbuffer: /dev/shm/qb-cib_rw-response-1929-14545-13-header
>    debug: qb_rb_close:  Closing ringbuffer: /dev/shm/qb-cib_rw-event-1929-14545-13-header
>     info: validate_with_relaxng:        Creating RNG parser context
>    debug: cib_file_signon:      crm_simulate: Opened connection to local file '/var/lib/pacemaker/cib/shadow.14545'
>     info: cib_file_perform_op_delegate:         cib_query on (null)
>    debug: cib_acl_enabled:      CIB ACL is disabled
>    debug: unpack_config:        STONITH timeout: 60000
>    debug: unpack_config:        STONITH of failed nodes is disabled
>    debug: unpack_config:        Stop all active resources: false
>    debug: unpack_config:        Cluster is symmetric - resources can run anywhere by default
>    debug: unpack_config:        Default stickiness: 0
>   notice: unpack_config:        On loss of CCM Quorum: Ignore
>    debug: unpack_config:        Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>     info: determine_online_status:      Node ACL001 is online
>     info: determine_online_status:      Node ACL002 is online
>    debug: find_anonymous_clone:         Internally renamed res_drbd_1 on ACL001 to res_drbd_1:0
>    debug: find_anonymous_clone:         Internally renamed res_drbd_1 on ACL002 to res_drbd_1:1
> 
> Current cluster status:
> Online: [ ACL001 ACL002 ]
> 
>  Master/Slave Set: ms_drbd_1 [res_drbd_1]
>    debug: native_active:        Resource res_drbd_1:0 active on ACL001
>    debug: native_active:        Resource res_drbd_1:0 active on ACL001
>    debug: native_active:        Resource res_drbd_1:1 active on ACL002
>    debug: native_active:        Resource res_drbd_1:1 active on ACL002
>      Slaves: [ ACL001 ACL002 ]
> 
>     info: clone_print:   Master/Slave Set: ms_drbd_1 [res_drbd_1]
>    debug: native_active:        Resource res_drbd_1:0 active on ACL001
>    debug: native_active:        Resource res_drbd_1:0 active on ACL001
>    debug: native_active:        Resource res_drbd_1:1 active on ACL002
>    debug: native_active:        Resource res_drbd_1:1 active on ACL002
>     info: short_print:       Slaves: [ ACL001 ACL002 ]
>    debug: native_assign_node:   Assigning ACL001 to res_drbd_1:0
>    debug: native_assign_node:   Assigning ACL002 to res_drbd_1:1
>    debug: clone_color:  Allocated 2 ms_drbd_1 instances of a possible 2
>    debug: master_color:         res_drbd_1:0 master score: -1
>    debug: master_color:         res_drbd_1:1 master score: -1
>     info: master_color:         ms_drbd_1: Promoted 0 instances of a possible 1 to master
>    debug: master_create_actions:        Creating actions for ms_drbd_1
>     info: LogActions:   Leave   res_drbd_1:0    (Slave ACL001)
>     info: LogActions:   Leave   res_drbd_1:1    (Slave ACL002)
> Transition Summary:
>     info: LogActions:   Leave   res_drbd_1:0    (Slave ACL001)
>     info: LogActions:   Leave   res_drbd_1:1    (Slave ACL002)
>    debug: cib_file_signoff:     Signing out of the CIB Service
>     info: cib_file_signoff:     Wrote CIB to /var/lib/pacemaker/cib/shadow.14545
>     info: crm_xml_cleanup:      Cleaning up memory from libxml2
> 
> -----Original Message-----
> From: Andrei Borzenkov [mailto:arvidjaar at gmail.com]
> Sent: 14 June 2015 06:34
> To: Brian D. Lees
> Cc: 'Takehiro Matsushima'; 'Cluster Labs - All topics related to open-source clustering welcomed'
> Subject: Re: [ClusterLabs] MS Promotion Not Working
> 
> В Sat, 13 Jun 2015 13:43:46 +0200
> "Brian D. Lees" <brian at fides.me.uk> пишет:
> 
> > 
> > primitive res_drbd_1 ocf:linbit:drbd \
> > 
> >         params drbd_resource=acl_shared \
> > 
> >         operations $id=res_drbd_1-operations \
> > 
> >         op start interval=0 timeout=240 \
> > 
> >         op promote interval=0 timeout=90 \
> > 
> >         op demote interval=0 timeout=90 \
> > 
> >         op stop interval=0 timeout=100 \
> > 
> >         op monitor interval=10 timeout=20 role=Master \
> > 
> >         op monitor interval=11 timeout=20 role=Slave \
> > 
> >         op notify interval=0 timeout=90
> > 
> > ms ms_drbd_1 res_drbd_1 \
> > 
> >         meta clone-max=2 notify=true interleave=true 
> > target-role=Started
> 
> According to pacemaker documentation
> 
> Started - Allow the resource to be started (In the case of multi-state 
> resources, they will not promoted to master)
> 
> You probably want to have Master here.
> 






More information about the Users mailing list