[ClusterLabs] MS Promotion Not Working
Brian D. Lees
brian at fides.me.uk
Sat Jun 13 17:10:03 UTC 2015
Also, do you have any idea about the following lines extracted from the
crm_simulate output below:
debug: find_anonymous_clone: Internally renamed res_drbd_1 on
ACL001 to res_drbd_1:0
debug: determine_op_status: res_drbd_1_monitor_11000 on ACL001 returned
'master' (8) instead of the expected value: 'ok' (0)
warning: unpack_rsc_op_failure: Processing failed op monitor for
res_drbd_1:0 on ACL001: master (8)
debug: find_anonymous_clone: Internally renamed res_drbd_1 on
ACL002 to res_drbd_1:1
debug: determine_op_status: res_drbd_1_monitor_11000 on ACL002 returned
'master' (8) instead of the expected value: 'ok' (0)
warning: unpack_rsc_op_failure: Processing failed op monitor for
res_drbd_1:1 on ACL002: master (8)
Full output:
ACL002:~ # crm_simulate --live-check -VVVVV --save-graph
/root/Documents/tmp.graph --save-dotfile /root/Documents/tmp.dot
debug: qb_rb_open_2: shm size:524301; real_size:528384;
rb->word_size:132096
debug: qb_rb_open_2: shm size:524301; real_size:528384;
rb->word_size:132096
debug: qb_rb_open_2: shm size:524301; real_size:528384;
rb->word_size:132096
debug: cib_native_signon_raw: Connection to CIB successful
debug: cib_native_signoff: Signing out of the CIB Service
debug: qb_ipcc_disconnect: qb_ipcc_disconnect()
debug: qb_rb_close: Closing ringbuffer:
/dev/shm/qb-cib_rw-request-1960-30045-13-header
debug: qb_rb_close: Closing ringbuffer:
/dev/shm/qb-cib_rw-response-1960-30045-13-header
debug: qb_rb_close: Closing ringbuffer:
/dev/shm/qb-cib_rw-event-1960-30045-13-header
info: validate_with_relaxng: Creating RNG parser context
debug: cib_file_signon: crm_simulate: Opened connection to local
file '/var/lib/pacemaker/cib/shadow.30045'
info: cib_file_perform_op_delegate: cib_query on (null)
debug: cib_acl_enabled: CIB ACL is disabled
debug: unpack_config: STONITH timeout: 60000
debug: unpack_config: STONITH of failed nodes is disabled
debug: unpack_config: Stop all active resources: false
debug: unpack_config: Cluster is symmetric - resources can run
anywhere by default
debug: unpack_config: Default stickiness: 0
notice: unpack_config: On loss of CCM Quorum: Ignore
debug: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' =
0, 'green' = 0
info: determine_online_status: Node ACL001 is online
info: determine_online_status: Node ACL002 is online
debug: find_anonymous_clone: Internally renamed res_drbd_1 on
ACL001 to res_drbd_1:0
debug: determine_op_status: res_drbd_1_monitor_11000 on ACL001 returned
'master' (8) instead of the expected value: 'ok' (0)
warning: unpack_rsc_op_failure: Processing failed op monitor for
res_drbd_1:0 on ACL001: master (8)
debug: find_anonymous_clone: Internally renamed res_drbd_1 on
ACL002 to res_drbd_1:1
debug: determine_op_status: res_drbd_1_monitor_11000 on ACL002 returned
'master' (8) instead of the expected value: 'ok' (0)
warning: unpack_rsc_op_failure: Processing failed op monitor for
res_drbd_1:1 on ACL002: master (8)
Current cluster status:
Online: [ ACL001 ACL002 ]
Master/Slave Set: ms_drbd_1 [res_drbd_1]
debug: native_active: Resource res_drbd_1:0 active on ACL001
debug: native_active: Resource res_drbd_1:0 active on ACL001
debug: native_active: Resource res_drbd_1:1 active on ACL002
debug: native_active: Resource res_drbd_1:1 active on ACL002
Slaves: [ ACL001 ACL002 ]
info: clone_print: Master/Slave Set: ms_drbd_1 [res_drbd_1]
debug: native_active: Resource res_drbd_1:0 active on ACL001
debug: native_active: Resource res_drbd_1:0 active on ACL001
debug: native_active: Resource res_drbd_1:1 active on ACL002
debug: native_active: Resource res_drbd_1:1 active on ACL002
info: short_print: Slaves: [ ACL001 ACL002 ]
debug: common_apply_stickiness: Resource res_drbd_1:0: preferring
current location (node=ACL001, weight=100)
info: get_failcount_full: res_drbd_1:0 has failed 1 times on ACL001
info: common_apply_stickiness: ms_drbd_1 can fail 999999 more times
on ACL001 before being forced off
info: get_failcount_full: res_drbd_1:1 has failed 1 times on ACL001
info: common_apply_stickiness: ms_drbd_1 can fail 999999 more times
on ACL001 before being forced off
info: get_failcount_full: res_drbd_1:0 has failed 3 times on ACL002
info: common_apply_stickiness: ms_drbd_1 can fail 999997 more times
on ACL002 before being forced off
debug: common_apply_stickiness: Resource res_drbd_1:1: preferring
current location (node=ACL002, weight=100)
info: get_failcount_full: res_drbd_1:1 has failed 3 times on ACL002
info: common_apply_stickiness: ms_drbd_1 can fail 999997 more times
on ACL002 before being forced off
debug: native_assign_node: Assigning ACL001 to res_drbd_1:0
debug: native_assign_node: Assigning ACL002 to res_drbd_1:1
debug: clone_color: Allocated 2 ms_drbd_1 instances of a possible 2
debug: master_color: res_drbd_1:0 master score: -1
debug: master_color: res_drbd_1:1 master score: -1
info: master_color: ms_drbd_1: Promoted 0 instances of a
possible 1 to master
debug: master_create_actions: Creating actions for ms_drbd_1
info: LogActions: Leave res_drbd_1:0 (Slave ACL001)
info: LogActions: Leave res_drbd_1:1 (Slave ACL002)
Transition Summary:
info: LogActions: Leave res_drbd_1:0 (Slave ACL001)
info: LogActions: Leave res_drbd_1:1 (Slave ACL002)
debug: cib_file_signoff: Signing out of the CIB Service
info: cib_file_signoff: Wrote CIB to
/var/lib/pacemaker/cib/shadow.30045
info: crm_xml_cleanup: Cleaning up memory from libxml2
-----Original Message-----
From: Brian D. Lees [mailto:brian at fides.me.uk]
Sent: 13 June 2015 18:06
To: 'Cluster Labs - All topics related to open-source clustering welcomed';
'Takehiro Matsushima'
Subject: Re: [ClusterLabs] MS Promotion Not Working
Some further information from /var/log/messages. I tried using LCMC to
force DRBD to MASTER on ACL002 and saw the following:
2015-06-13T16:48:28.832420+01:00 ACL002 kernel: [15272.108781] block drbd0:
role( Secondary -> Primary )
2015-06-13T16:48:38.927078+01:00 ACL002 crmd[1965]: notice:
process_lrm_event: Operation res_drbd_1_monitor_11000: master (node=ACL002,
call=50, rc=8, cib-update=31, confirmed=false)
2015-06-13T16:48:38.927372+01:00 ACL002 crmd[1965]: notice:
process_lrm_event: ACL002-res_drbd_1_monitor_11000:50 [ \n ]
2015-06-13T16:48:38.936605+01:00 ACL002 crm_simulate[16613]: notice:
crm_log_args: Invoked: crm_simulate -s -S -VVVVV -L
2015-06-13T16:48:39.020485+01:00 ACL002 crm_simulate[16613]: notice:
unpack_config: On loss of CCM Quorum: Ignore
2015-06-13T16:48:39.021011+01:00 ACL002 crm_simulate[16613]: warning:
unpack_rsc_op_failure: Processing failed op monitor for res_drbd_1:0 on
ACL001: master (8)
2015-06-13T16:48:39.021251+01:00 ACL002 crm_simulate[16613]: warning:
unpack_rsc_op_failure: Processing failed op monitor for res_drbd_1:1 on
ACL002: master (8)
2015-06-13T16:48:39.022862+01:00 ACL002 crm_simulate[16613]: notice:
LogActions: Demote res_drbd_1:1#011(Master -> Slave ACL002)
2015-06-13T16:48:39.023099+01:00 ACL002 crm_simulate[16613]: notice:
LogActions: Recover res_drbd_1:1#011(Master ACL002)
2015-06-13T16:48:39.040858+01:00 ACL002 crm_simulate[16613]: notice:
run_graph: Transition 0 (Complete=33, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=crm_simulate): Complete
What seemed to happen is that it was forcibly promoted and then demoted
again. I assume the relevant line is:
warning: unpack_rsc_op_failure: Processing failed op monitor for
res_drbd_1:0 on ACL001: master (8)
Any idea why the op monitor should be failing?
Regards,
Brian
-----Original Message-----
From: Brian D. Lees [mailto:brian at fides.me.uk]
Sent: 13 June 2015 17:43
To: 'Takehiro Matsushima'
Cc: 'Cluster Labs - All topics related to open-source clustering welcomed'
Subject: Re: [ClusterLabs] MS Promotion Not Working
No joy unfortunately:
node 1084751972: ACL001 \
attributes standby=off
node 1084752072: ACL002 \
attributes standby=off
primitive res_drbd_1 ocf:linbit:drbd \
params drbd_resource=acl_shared \
operations $id=res_drbd_1-operations \
op start interval=0 timeout=240 \
op promote interval=0 timeout=90 \
op demote interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=10 timeout=20 role=Master \
op monitor interval=11 timeout=20 role=Slave \
op notify interval=0 timeout=90
ms ms_drbd_1 res_drbd_1 \
meta clone-max=2 notify=true interleave=true target-role=Master
property cib-bootstrap-options: \
stonith-enabled=false \
no-quorum-policy=ignore \
dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
cluster-infrastructure=corosync \
cluster-name=aclcluster
rsc_defaults rsc-options: \
target-role=started \
resource-stickiness=100
Current cluster status:
Online: [ ACL001 ACL002 ]
Master/Slave Set: ms_drbd_1 [res_drbd_1]
Slaves: [ ACL001 ACL002 ]
Allocation scores:
clone_color: ms_drbd_1 allocation score on ACL001: 0
clone_color: ms_drbd_1 allocation score on ACL002: 0
clone_color: res_drbd_1:0 allocation score on ACL001: 100
clone_color: res_drbd_1:0 allocation score on ACL002: 0
clone_color: res_drbd_1:1 allocation score on ACL001: 0
clone_color: res_drbd_1:1 allocation score on ACL002: 100
native_color: res_drbd_1:0 allocation score on ACL001: 100
native_color: res_drbd_1:0 allocation score on ACL002: 0
native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
native_color: res_drbd_1:1 allocation score on ACL002: 100
res_drbd_1:0 promotion score on ACL001: -1
res_drbd_1:1 promotion score on ACL002: -1
Transition Summary:
-----Original Message-----
From: Takehiro Matsushima [mailto:takehiro.dreamizm at gmail.com]
Sent: 13 June 2015 14:40
To: brian at fides.me.uk
Cc: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] MS Promotion Not Working
Hello Brian,
Did you try without filesystem resource?
If not, please try and watch that configure only DRBD related
resources(primitive and ms), and change target-role="Started" to "Master" of
ms "ms_drbd_1".
regards,
Takehiro Matsushima
2015-06-13 20:43 GMT+09:00 Brian D. Lees <brian at fides.me.uk>:
> Takehiro,
>
>
>
> Thanks very much for your suggestion; unfortunately there is no
> change in the outcome.
>
>
>
> Configuration now is:
>
>
>
> node 1084751972: ACL001 \
>
> attributes standby=off
>
> node 1084752072: ACL002 \
>
> attributes standby=off
>
> primitive res_Filesystem_shared_fs Filesystem \
>
> params device="/dev/drbd/by-res/acl_shared/1"
> directory="/mnt/aclcluster" fstype=ext4 \
>
> operations $id=res_Filesystem_shared_fs-operations \
>
> op start interval=0 timeout=60 \
>
> op stop interval=0 timeout=60 \
>
> op monitor interval=20 timeout=40 start-delay=0 \
>
> op notify interval=0 timeout=60 \
>
> meta allow-migrate=true failure-timeout=60
>
> primitive res_drbd_1 ocf:linbit:drbd \
>
> params drbd_resource=acl_shared \
>
> operations $id=res_drbd_1-operations \
>
> op start interval=0 timeout=240 \
>
> op promote interval=0 timeout=90 \
>
> op demote interval=0 timeout=90 \
>
> op stop interval=0 timeout=100 \
>
> op monitor interval=10 timeout=20 role=Master \
>
> op monitor interval=11 timeout=20 role=Slave \
>
> op notify interval=0 timeout=90
>
> ms ms_drbd_1 res_drbd_1 \
>
> meta clone-max=2 notify=true interleave=true
> target-role=Started
>
> colocation col_res_Filesystem_shared_fs_ms_drbd_1 inf:
> res_Filesystem_shared_fs ms_drbd_1:Master
>
> order ord_ms_drbd_1_res_Filesystem_shared_fs inf: ms_drbd_1:promote
> res_Filesystem_shared_fs:start
>
> property cib-bootstrap-options: \
>
> stonith-enabled=false \
>
> no-quorum-policy=ignore \
>
> dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
>
> cluster-infrastructure=corosync \
>
> cluster-name=aclcluster
>
> rsc_defaults rsc-options: \
>
> target-role=started \
>
> resource-stickiness=100
>
>
>
> And the scores are:
>
>
>
> Current cluster status:
>
> Online: [ ACL001 ACL002 ]
>
>
>
> Master/Slave Set: ms_drbd_1 [res_drbd_1]
>
> Slaves: [ ACL001 ACL002 ]
>
> res_Filesystem_shared_fs (ocf::heartbeat:Filesystem): Stopped
>
>
>
> Allocation scores:
>
> clone_color: ms_drbd_1 allocation score on ACL001: 0
>
> clone_color: ms_drbd_1 allocation score on ACL002: 0
>
> clone_color: res_drbd_1:0 allocation score on ACL001: 100
>
> clone_color: res_drbd_1:0 allocation score on ACL002: 0
>
> clone_color: res_drbd_1:1 allocation score on ACL001: 0
>
> clone_color: res_drbd_1:1 allocation score on ACL002: 100
>
> native_color: res_drbd_1:0 allocation score on ACL001: 100
>
> native_color: res_drbd_1:0 allocation score on ACL002: 0
>
> native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
>
> native_color: res_drbd_1:1 allocation score on ACL002: 100
>
> res_drbd_1:0 promotion score on ACL001: -1
>
> res_drbd_1:1 promotion score on ACL002: -1
>
> native_color: res_Filesystem_shared_fs allocation score on ACL001:
> -INFINITY
>
> native_color: res_Filesystem_shared_fs allocation score on ACL002:
> -INFINITY
>
>
>
> Transition Summary:
>
>
>
> Any further ideas?
>
>
>
> Regards,
>
>
>
> Brian
>
>
>
> From: Takehiro Matsushima [mailto:takehiro.dreamizm at gmail.com]
> Sent: 13 June 2015 02:22
> To: Cluster Labs - All topics related to open-source clustering
> welcomed; brian at fides.me.uk
> Subject: Re: [ClusterLabs] MS Promotion Not Working
>
>
>
> Hello Brian,
>
> Try to define two "op monitor" with role="Master" and "Slave" for drbd
> resource like this; primitive res_drbd_1 ocf:linbit:drbd \
> params drbd_resource=acl_shared \
> operations $id=res_drbd_1-operations \
> op start interval=0 timeout=240 \
> op promote interval=0 timeout=90 \
> op demote interval=0 timeout=90 \
> op stop interval=0 timeout=100 \
> op monitor interval=10 timeout=20 role="Master" \
> op monitor interval=11 timeout=20 role="Slave" \
> op notify interval=0 timeout=90
>
> regards,
> Takehiro Matsushima
_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list