[ClusterLabs] MS Promotion Not Working
Andrei Borzenkov
arvidjaar at gmail.com
Sun Jun 14 12:46:54 UTC 2015
В Sun, 14 Jun 2015 14:26:16 +0200
"Brian D. Lees" <brian at fides.me.uk> пишет:
> Success! I have managed to get this to work and I know 'how' but not 'why'. The promotion scores seem to be -1 by default. I set a location preference of 1 for master on each node and now it works. When one node is rebooted it moves to the other successfully. So I understand how it works (the location preference changes the promotion scores to 0) but I don’t understand why the default should be -1. If you configure a master/slave resource you surely want it to be master somewhere?
pacemaker cannot decide where resource can become master by itself.
Only resource agent has enough knowledge to check resource state and
decide which of multiple replicas is suitable to become master.
Attempt to promote instance when it cannot be done will simply result
in failures and ultimately node will be blacklisted from running
resource.
> Anyway here is the configuration etc and Takehiro asked my distro
which is Suse 13.2. Thanks for all your help.
>
> node 1084751972: ACL001 \
> attributes standby=off
> node 1084752072: ACL002 \
> attributes standby=off
> primitive res_drbd_1 ocf:linbit:drbd \
> params drbd_resource=acl_shared \
> operations $id=res_drbd_1-operations \
> op start interval=0 timeout=240 \
> op promote interval=0 timeout=90 \
> op demote interval=0 timeout=90 \
> op stop interval=0 timeout=100 \
> op monitor interval=10 timeout=20 role=Master start-delay=0 \
> op monitor interval=11 timeout=20 role=Slave start-delay=0 \
> op notify interval=0 timeout=90 \
> meta
> ms ms_drbd_1 res_drbd_1 \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=master
> location drbd_primary_1 ms_drbd_1 role=Master 1: ACL001
> location drbd_primary_2 ms_drbd_1 role=Master 1: ACL002
> property cib-bootstrap-options: \
> symmetric-cluster=true \
> stonith-enabled=false \
> no-quorum-policy=ignore \
> dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
> cluster-infrastructure=corosync \
> cluster-name=aclcluster
> rsc_defaults rsc-options: \
> resource-stickiness=0
>
> Current cluster status:
> Online: [ ACL001 ACL002 ]
>
> Master/Slave Set: ms_drbd_1 [res_drbd_1]
> Masters: [ ACL002 ]
> Slaves: [ ACL001 ]
>
> Allocation scores:
> clone_color: ms_drbd_1 allocation score on ACL001: 0
> clone_color: ms_drbd_1 allocation score on ACL002: 0
> clone_color: res_drbd_1:0 allocation score on ACL001: 0
> clone_color: res_drbd_1:0 allocation score on ACL002: 0
> clone_color: res_drbd_1:1 allocation score on ACL001: 0
> clone_color: res_drbd_1:1 allocation score on ACL002: 0
> native_color: res_drbd_1:0 allocation score on ACL001: 0
> native_color: res_drbd_1:0 allocation score on ACL002: 0
> native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
> native_color: res_drbd_1:1 allocation score on ACL002: 0
> res_drbd_1:1 promotion score on ACL002: 0
> res_drbd_1:0 promotion score on ACL001: 0
>
> Transition Summary:
>
> -----Original Message-----
> From: Andrei Borzenkov [mailto:arvidjaar at gmail.com]
> Sent: 14 June 2015 13:27
> To: Brian D. Lees
> Cc: 'Takehiro Matsushima'; 'Cluster Labs - All topics related to open-source clustering welcomed'
> Subject: Re: [ClusterLabs] MS Promotion Not Working
>
> В Sun, 14 Jun 2015 13:16:18 +0200
> "Brian D. Lees" <brian at fides.me.uk> пишет:
>
> > Andrei,
> >
> > Thanks for the suggestion; however it sadly has the same outcome! I think the key to this is understanding how the promotion scores are calculated as that will point us towards the items which are making the scores both -1. Do you have any idea how this works?
> >
>
> As far as I know there is no default master score. Resource agent is responsible for deciding which instance should (can) be promoted and setting scores accordingly.
>
> > node 1084751972: ACL001 \
> > attributes standby=off
> > node 1084752072: ACL002 \
> > attributes standby=off
> > primitive res_drbd_1 ocf:linbit:drbd \
> > params drbd_resource=acl_shared \
> > operations $id=res_drbd_1-operations \
> > op start interval=0 timeout=240 \
> > op promote interval=0 timeout=90 \
> > op demote interval=0 timeout=90 \
> > op stop interval=0 timeout=100 \
> > op monitor interval=10 timeout=20 role=Master start-delay=0 \
> > op monitor interval=11 timeout=20 role=Slave start-delay=0 \
> > op notify interval=0 timeout=90 \
> > meta
> > ms ms_drbd_1 res_drbd_1 \
> > meta master-max=1 master-node-max=1 clone-max=2
> > clone-node-max=1 notify=true target-role=master property cib-bootstrap-options: \
> > symmetric-cluster=true \
> > stonith-enabled=false \
> > no-quorum-policy=ignore \
> > dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
> > cluster-infrastructure=corosync \
> > cluster-name=aclcluster
> > rsc_defaults rsc-options: \
> > resource-stickiness=0
> >
> > Current cluster status:
> > Online: [ ACL001 ACL002 ]
> >
> > Master/Slave Set: ms_drbd_1 [res_drbd_1]
> > Slaves: [ ACL001 ACL002 ]
> >
> > Allocation scores:
> > clone_color: ms_drbd_1 allocation score on ACL001: 0
> > clone_color: ms_drbd_1 allocation score on ACL002: 0
> > clone_color: res_drbd_1:0 allocation score on ACL001: 0
> > clone_color: res_drbd_1:0 allocation score on ACL002: 0
> > clone_color: res_drbd_1:1 allocation score on ACL001: 0
> > clone_color: res_drbd_1:1 allocation score on ACL002: 0
> > native_color: res_drbd_1:0 allocation score on ACL001: 0
> > native_color: res_drbd_1:0 allocation score on ACL002: 0
> > native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
> > native_color: res_drbd_1:1 allocation score on ACL002: 0
> > res_drbd_1:0 promotion score on ACL001: -1
> > res_drbd_1:1 promotion score on ACL002: -1
> >
> > Transition Summary:
> >
> >
> > debug: qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096
> > debug: qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096
> > debug: qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096
> > debug: cib_native_signon_raw: Connection to CIB successful
> > debug: cib_native_signoff: Signing out of the CIB Service
> > debug: qb_ipcc_disconnect: qb_ipcc_disconnect()
> > debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cib_rw-request-1929-14545-13-header
> > debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cib_rw-response-1929-14545-13-header
> > debug: qb_rb_close: Closing ringbuffer: /dev/shm/qb-cib_rw-event-1929-14545-13-header
> > info: validate_with_relaxng: Creating RNG parser context
> > debug: cib_file_signon: crm_simulate: Opened connection to local file '/var/lib/pacemaker/cib/shadow.14545'
> > info: cib_file_perform_op_delegate: cib_query on (null)
> > debug: cib_acl_enabled: CIB ACL is disabled
> > debug: unpack_config: STONITH timeout: 60000
> > debug: unpack_config: STONITH of failed nodes is disabled
> > debug: unpack_config: Stop all active resources: false
> > debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
> > debug: unpack_config: Default stickiness: 0
> > notice: unpack_config: On loss of CCM Quorum: Ignore
> > debug: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > info: determine_online_status: Node ACL001 is online
> > info: determine_online_status: Node ACL002 is online
> > debug: find_anonymous_clone: Internally renamed res_drbd_1 on ACL001 to res_drbd_1:0
> > debug: find_anonymous_clone: Internally renamed res_drbd_1 on ACL002 to res_drbd_1:1
> >
> > Current cluster status:
> > Online: [ ACL001 ACL002 ]
> >
> > Master/Slave Set: ms_drbd_1 [res_drbd_1]
> > debug: native_active: Resource res_drbd_1:0 active on ACL001
> > debug: native_active: Resource res_drbd_1:0 active on ACL001
> > debug: native_active: Resource res_drbd_1:1 active on ACL002
> > debug: native_active: Resource res_drbd_1:1 active on ACL002
> > Slaves: [ ACL001 ACL002 ]
> >
> > info: clone_print: Master/Slave Set: ms_drbd_1 [res_drbd_1]
> > debug: native_active: Resource res_drbd_1:0 active on ACL001
> > debug: native_active: Resource res_drbd_1:0 active on ACL001
> > debug: native_active: Resource res_drbd_1:1 active on ACL002
> > debug: native_active: Resource res_drbd_1:1 active on ACL002
> > info: short_print: Slaves: [ ACL001 ACL002 ]
> > debug: native_assign_node: Assigning ACL001 to res_drbd_1:0
> > debug: native_assign_node: Assigning ACL002 to res_drbd_1:1
> > debug: clone_color: Allocated 2 ms_drbd_1 instances of a possible 2
> > debug: master_color: res_drbd_1:0 master score: -1
> > debug: master_color: res_drbd_1:1 master score: -1
> > info: master_color: ms_drbd_1: Promoted 0 instances of a possible 1 to master
> > debug: master_create_actions: Creating actions for ms_drbd_1
> > info: LogActions: Leave res_drbd_1:0 (Slave ACL001)
> > info: LogActions: Leave res_drbd_1:1 (Slave ACL002)
> > Transition Summary:
> > info: LogActions: Leave res_drbd_1:0 (Slave ACL001)
> > info: LogActions: Leave res_drbd_1:1 (Slave ACL002)
> > debug: cib_file_signoff: Signing out of the CIB Service
> > info: cib_file_signoff: Wrote CIB to /var/lib/pacemaker/cib/shadow.14545
> > info: crm_xml_cleanup: Cleaning up memory from libxml2
> >
> > -----Original Message-----
> > From: Andrei Borzenkov [mailto:arvidjaar at gmail.com]
> > Sent: 14 June 2015 06:34
> > To: Brian D. Lees
> > Cc: 'Takehiro Matsushima'; 'Cluster Labs - All topics related to open-source clustering welcomed'
> > Subject: Re: [ClusterLabs] MS Promotion Not Working
> >
> > В Sat, 13 Jun 2015 13:43:46 +0200
> > "Brian D. Lees" <brian at fides.me.uk> пишет:
> >
> > >
> > > primitive res_drbd_1 ocf:linbit:drbd \
> > >
> > > params drbd_resource=acl_shared \
> > >
> > > operations $id=res_drbd_1-operations \
> > >
> > > op start interval=0 timeout=240 \
> > >
> > > op promote interval=0 timeout=90 \
> > >
> > > op demote interval=0 timeout=90 \
> > >
> > > op stop interval=0 timeout=100 \
> > >
> > > op monitor interval=10 timeout=20 role=Master \
> > >
> > > op monitor interval=11 timeout=20 role=Slave \
> > >
> > > op notify interval=0 timeout=90
> > >
> > > ms ms_drbd_1 res_drbd_1 \
> > >
> > > meta clone-max=2 notify=true interleave=true
> > > target-role=Started
> >
> > According to pacemaker documentation
> >
> > Started - Allow the resource to be started (In the case of multi-state
> > resources, they will not promoted to master)
> >
> > You probably want to have Master here.
> >
>
>
More information about the Users
mailing list