[ClusterLabs] MS Promotion Not Working

Sat Jun 13 17:10:03 UTC 2015

Also, do you have any idea about the following lines extracted from the
crm_simulate output below:

   debug: find_anonymous_clone:         Internally renamed res_drbd_1 on
ACL001 to res_drbd_1:0
   debug: determine_op_status:  res_drbd_1_monitor_11000 on ACL001 returned
'master' (8) instead of the expected value: 'ok' (0)
 warning: unpack_rsc_op_failure:        Processing failed op monitor for
res_drbd_1:0 on ACL001: master (8)
   debug: find_anonymous_clone:         Internally renamed res_drbd_1 on
ACL002 to res_drbd_1:1
   debug: determine_op_status:  res_drbd_1_monitor_11000 on ACL002 returned
'master' (8) instead of the expected value: 'ok' (0)
 warning: unpack_rsc_op_failure:        Processing failed op monitor for
res_drbd_1:1 on ACL002: master (8)

Full output:

ACL002:~ # crm_simulate --live-check -VVVVV --save-graph
/root/Documents/tmp.graph --save-dotfile /root/Documents/tmp.dot
   debug: qb_rb_open_2:         shm size:524301; real_size:528384;
rb->word_size:132096
   debug: qb_rb_open_2:         shm size:524301; real_size:528384;
rb->word_size:132096
   debug: qb_rb_open_2:         shm size:524301; real_size:528384;
rb->word_size:132096
   debug: cib_native_signon_raw:        Connection to CIB successful
   debug: cib_native_signoff:   Signing out of the CIB Service
   debug: qb_ipcc_disconnect:   qb_ipcc_disconnect()
   debug: qb_rb_close:  Closing ringbuffer:
/dev/shm/qb-cib_rw-request-1960-30045-13-header
   debug: qb_rb_close:  Closing ringbuffer:
/dev/shm/qb-cib_rw-response-1960-30045-13-header
   debug: qb_rb_close:  Closing ringbuffer:
/dev/shm/qb-cib_rw-event-1960-30045-13-header
    info: validate_with_relaxng:        Creating RNG parser context
   debug: cib_file_signon:      crm_simulate: Opened connection to local
file '/var/lib/pacemaker/cib/shadow.30045'
    info: cib_file_perform_op_delegate:         cib_query on (null)
   debug: cib_acl_enabled:      CIB ACL is disabled
   debug: unpack_config:        STONITH timeout: 60000
   debug: unpack_config:        STONITH of failed nodes is disabled
   debug: unpack_config:        Stop all active resources: false
   debug: unpack_config:        Cluster is symmetric - resources can run
anywhere by default
   debug: unpack_config:        Default stickiness: 0
  notice: unpack_config:        On loss of CCM Quorum: Ignore
   debug: unpack_config:        Node scores: 'red' = -INFINITY, 'yellow' =
0, 'green' = 0
    info: determine_online_status:      Node ACL001 is online
    info: determine_online_status:      Node ACL002 is online
   debug: find_anonymous_clone:         Internally renamed res_drbd_1 on
ACL001 to res_drbd_1:0
   debug: determine_op_status:  res_drbd_1_monitor_11000 on ACL001 returned
'master' (8) instead of the expected value: 'ok' (0)
 warning: unpack_rsc_op_failure:        Processing failed op monitor for
res_drbd_1:0 on ACL001: master (8)
   debug: find_anonymous_clone:         Internally renamed res_drbd_1 on
ACL002 to res_drbd_1:1
   debug: determine_op_status:  res_drbd_1_monitor_11000 on ACL002 returned
'master' (8) instead of the expected value: 'ok' (0)
 warning: unpack_rsc_op_failure:        Processing failed op monitor for
res_drbd_1:1 on ACL002: master (8)

Current cluster status:
Online: [ ACL001 ACL002 ]

 Master/Slave Set: ms_drbd_1 [res_drbd_1]
   debug: native_active:        Resource res_drbd_1:0 active on ACL001
   debug: native_active:        Resource res_drbd_1:0 active on ACL001
   debug: native_active:        Resource res_drbd_1:1 active on ACL002
   debug: native_active:        Resource res_drbd_1:1 active on ACL002
     Slaves: [ ACL001 ACL002 ]

    info: clone_print:   Master/Slave Set: ms_drbd_1 [res_drbd_1]
   debug: native_active:        Resource res_drbd_1:0 active on ACL001
   debug: native_active:        Resource res_drbd_1:0 active on ACL001
   debug: native_active:        Resource res_drbd_1:1 active on ACL002
   debug: native_active:        Resource res_drbd_1:1 active on ACL002
    info: short_print:       Slaves: [ ACL001 ACL002 ]
   debug: common_apply_stickiness:      Resource res_drbd_1:0: preferring
current location (node=ACL001, weight=100)
    info: get_failcount_full:   res_drbd_1:0 has failed 1 times on ACL001
    info: common_apply_stickiness:      ms_drbd_1 can fail 999999 more times
on ACL001 before being forced off
    info: get_failcount_full:   res_drbd_1:1 has failed 1 times on ACL001
    info: common_apply_stickiness:      ms_drbd_1 can fail 999999 more times
on ACL001 before being forced off
    info: get_failcount_full:   res_drbd_1:0 has failed 3 times on ACL002
    info: common_apply_stickiness:      ms_drbd_1 can fail 999997 more times
on ACL002 before being forced off
   debug: common_apply_stickiness:      Resource res_drbd_1:1: preferring
current location (node=ACL002, weight=100)
    info: get_failcount_full:   res_drbd_1:1 has failed 3 times on ACL002
    info: common_apply_stickiness:      ms_drbd_1 can fail 999997 more times
on ACL002 before being forced off
   debug: native_assign_node:   Assigning ACL001 to res_drbd_1:0
   debug: native_assign_node:   Assigning ACL002 to res_drbd_1:1
   debug: clone_color:  Allocated 2 ms_drbd_1 instances of a possible 2
   debug: master_color:         res_drbd_1:0 master score: -1
   debug: master_color:         res_drbd_1:1 master score: -1
    info: master_color:         ms_drbd_1: Promoted 0 instances of a
possible 1 to master
   debug: master_create_actions:        Creating actions for ms_drbd_1
    info: LogActions:   Leave   res_drbd_1:0    (Slave ACL001)
    info: LogActions:   Leave   res_drbd_1:1    (Slave ACL002)
Transition Summary:
    info: LogActions:   Leave   res_drbd_1:0    (Slave ACL001)
    info: LogActions:   Leave   res_drbd_1:1    (Slave ACL002)
   debug: cib_file_signoff:     Signing out of the CIB Service
    info: cib_file_signoff:     Wrote CIB to
/var/lib/pacemaker/cib/shadow.30045
    info: crm_xml_cleanup:      Cleaning up memory from libxml2

-----Original Message-----
From: Brian D. Lees [mailto:brian at fides.me.uk] 
Sent: 13 June 2015 18:06
To: 'Cluster Labs - All topics related to open-source clustering welcomed';
'Takehiro Matsushima'
Subject: Re: [ClusterLabs] MS Promotion Not Working

Some further information from /var/log/messages.  I tried using LCMC to
force DRBD to MASTER on ACL002 and saw the following:

2015-06-13T16:48:28.832420+01:00 ACL002 kernel: [15272.108781] block drbd0:
role( Secondary -> Primary ) 
2015-06-13T16:48:38.927078+01:00 ACL002 crmd[1965]:   notice:
process_lrm_event: Operation res_drbd_1_monitor_11000: master (node=ACL002,
call=50, rc=8, cib-update=31, confirmed=false)
2015-06-13T16:48:38.927372+01:00 ACL002 crmd[1965]:   notice:
process_lrm_event: ACL002-res_drbd_1_monitor_11000:50 [ \n ]
2015-06-13T16:48:38.936605+01:00 ACL002 crm_simulate[16613]:   notice:
crm_log_args: Invoked: crm_simulate -s -S -VVVVV -L
2015-06-13T16:48:39.020485+01:00 ACL002 crm_simulate[16613]:   notice:
unpack_config: On loss of CCM Quorum: Ignore
2015-06-13T16:48:39.021011+01:00 ACL002 crm_simulate[16613]:  warning:
unpack_rsc_op_failure: Processing failed op monitor for res_drbd_1:0 on
ACL001: master (8)
2015-06-13T16:48:39.021251+01:00 ACL002 crm_simulate[16613]:  warning:
unpack_rsc_op_failure: Processing failed op monitor for res_drbd_1:1 on
ACL002: master (8)
2015-06-13T16:48:39.022862+01:00 ACL002 crm_simulate[16613]:   notice:
LogActions: Demote  res_drbd_1:1#011(Master -> Slave ACL002)
2015-06-13T16:48:39.023099+01:00 ACL002 crm_simulate[16613]:   notice:
LogActions: Recover res_drbd_1:1#011(Master ACL002)
2015-06-13T16:48:39.040858+01:00 ACL002 crm_simulate[16613]:   notice:
run_graph: Transition 0 (Complete=33, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=crm_simulate): Complete

What seemed to happen is that it was forcibly promoted and then demoted
again.  I assume the relevant line is:

warning: unpack_rsc_op_failure: Processing failed op monitor for
res_drbd_1:0 on ACL001: master (8)

Any idea why the op monitor should be failing?

Regards,

Brian

-----Original Message-----
From: Brian D. Lees [mailto:brian at fides.me.uk]
Sent: 13 June 2015 17:43
To: 'Takehiro Matsushima'
Cc: 'Cluster Labs - All topics related to open-source clustering welcomed'
Subject: Re: [ClusterLabs] MS Promotion Not Working

No joy unfortunately:

node 1084751972: ACL001 \
        attributes standby=off
node 1084752072: ACL002 \
        attributes standby=off
primitive res_drbd_1 ocf:linbit:drbd \
        params drbd_resource=acl_shared \
        operations $id=res_drbd_1-operations \
        op start interval=0 timeout=240 \
        op promote interval=0 timeout=90 \
        op demote interval=0 timeout=90 \
        op stop interval=0 timeout=100 \
        op monitor interval=10 timeout=20 role=Master \
        op monitor interval=11 timeout=20 role=Slave \
        op notify interval=0 timeout=90
ms ms_drbd_1 res_drbd_1 \
        meta clone-max=2 notify=true interleave=true target-role=Master
property cib-bootstrap-options: \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
        cluster-infrastructure=corosync \
        cluster-name=aclcluster
rsc_defaults rsc-options: \
        target-role=started \
        resource-stickiness=100

Current cluster status:
Online: [ ACL001 ACL002 ]

 Master/Slave Set: ms_drbd_1 [res_drbd_1]
     Slaves: [ ACL001 ACL002 ]

Allocation scores:
clone_color: ms_drbd_1 allocation score on ACL001: 0
clone_color: ms_drbd_1 allocation score on ACL002: 0
clone_color: res_drbd_1:0 allocation score on ACL001: 100
clone_color: res_drbd_1:0 allocation score on ACL002: 0
clone_color: res_drbd_1:1 allocation score on ACL001: 0
clone_color: res_drbd_1:1 allocation score on ACL002: 100
native_color: res_drbd_1:0 allocation score on ACL001: 100
native_color: res_drbd_1:0 allocation score on ACL002: 0
native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
native_color: res_drbd_1:1 allocation score on ACL002: 100
res_drbd_1:0 promotion score on ACL001: -1
res_drbd_1:1 promotion score on ACL002: -1

Transition Summary:

-----Original Message-----
From: Takehiro Matsushima [mailto:takehiro.dreamizm at gmail.com]
Sent: 13 June 2015 14:40
To: brian at fides.me.uk
Cc: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] MS Promotion Not Working

Hello Brian,

Did you try without filesystem resource?
If not, please try and watch that configure only DRBD related
resources(primitive and ms), and change target-role="Started" to "Master" of
ms "ms_drbd_1".

regards,

Takehiro Matsushima

2015-06-13 20:43 GMT+09:00 Brian D. Lees <brian at fides.me.uk>:
> Takehiro,
>
>
>
> Thanks very much for your suggestion;  unfortunately there is no 
> change in the outcome.
>
>
>
> Configuration now is:
>
>
>
> node 1084751972: ACL001 \
>
>         attributes standby=off
>
> node 1084752072: ACL002 \
>
>         attributes standby=off
>
> primitive res_Filesystem_shared_fs Filesystem \
>
>         params device="/dev/drbd/by-res/acl_shared/1"
> directory="/mnt/aclcluster" fstype=ext4 \
>
>         operations $id=res_Filesystem_shared_fs-operations \
>
>         op start interval=0 timeout=60 \
>
>         op stop interval=0 timeout=60 \
>
>         op monitor interval=20 timeout=40 start-delay=0 \
>
>         op notify interval=0 timeout=60 \
>
>         meta allow-migrate=true failure-timeout=60
>
> primitive res_drbd_1 ocf:linbit:drbd \
>
>         params drbd_resource=acl_shared \
>
>         operations $id=res_drbd_1-operations \
>
>         op start interval=0 timeout=240 \
>
>         op promote interval=0 timeout=90 \
>
>         op demote interval=0 timeout=90 \
>
>         op stop interval=0 timeout=100 \
>
>         op monitor interval=10 timeout=20 role=Master \
>
>         op monitor interval=11 timeout=20 role=Slave \
>
>         op notify interval=0 timeout=90
>
> ms ms_drbd_1 res_drbd_1 \
>
>         meta clone-max=2 notify=true interleave=true 
> target-role=Started
>
> colocation col_res_Filesystem_shared_fs_ms_drbd_1 inf:
> res_Filesystem_shared_fs ms_drbd_1:Master
>
> order ord_ms_drbd_1_res_Filesystem_shared_fs inf: ms_drbd_1:promote 
> res_Filesystem_shared_fs:start
>
> property cib-bootstrap-options: \
>
>         stonith-enabled=false \
>
>         no-quorum-policy=ignore \
>
>         dc-version=1.1.12-1.1.12.git20140904.266d5c2 \
>
>         cluster-infrastructure=corosync \
>
>         cluster-name=aclcluster
>
> rsc_defaults rsc-options: \
>
>         target-role=started \
>
>         resource-stickiness=100
>
>
>
> And the scores are:
>
>
>
> Current cluster status:
>
> Online: [ ACL001 ACL002 ]
>
>
>
> Master/Slave Set: ms_drbd_1 [res_drbd_1]
>
>      Slaves: [ ACL001 ACL002 ]
>
> res_Filesystem_shared_fs       (ocf::heartbeat:Filesystem):    Stopped
>
>
>
> Allocation scores:
>
> clone_color: ms_drbd_1 allocation score on ACL001: 0
>
> clone_color: ms_drbd_1 allocation score on ACL002: 0
>
> clone_color: res_drbd_1:0 allocation score on ACL001: 100
>
> clone_color: res_drbd_1:0 allocation score on ACL002: 0
>
> clone_color: res_drbd_1:1 allocation score on ACL001: 0
>
> clone_color: res_drbd_1:1 allocation score on ACL002: 100
>
> native_color: res_drbd_1:0 allocation score on ACL001: 100
>
> native_color: res_drbd_1:0 allocation score on ACL002: 0
>
> native_color: res_drbd_1:1 allocation score on ACL001: -INFINITY
>
> native_color: res_drbd_1:1 allocation score on ACL002: 100
>
> res_drbd_1:0 promotion score on ACL001: -1
>
> res_drbd_1:1 promotion score on ACL002: -1
>
> native_color: res_Filesystem_shared_fs allocation score on ACL001: 
> -INFINITY
>
> native_color: res_Filesystem_shared_fs allocation score on ACL002: 
> -INFINITY
>
>
>
> Transition Summary:
>
>
>
> Any further ideas?
>
>
>
> Regards,
>
>
>
> Brian
>
>
>
> From: Takehiro Matsushima [mailto:takehiro.dreamizm at gmail.com]
> Sent: 13 June 2015 02:22
> To: Cluster Labs - All topics related to open-source clustering 
> welcomed; brian at fides.me.uk
> Subject: Re: [ClusterLabs] MS Promotion Not Working
>
>
>
> Hello Brian,
>
> Try to define two "op monitor" with role="Master" and "Slave" for drbd 
> resource like this; primitive res_drbd_1 ocf:linbit:drbd \
>         params drbd_resource=acl_shared \
>         operations $id=res_drbd_1-operations \
>         op start interval=0 timeout=240 \
>         op promote interval=0 timeout=90 \
>         op demote interval=0 timeout=90 \
>         op stop interval=0 timeout=100 \
>         op monitor interval=10 timeout=20 role="Master" \
>         op monitor interval=11 timeout=20 role="Slave" \
>         op notify interval=0 timeout=90
>
> regards,
> Takehiro Matsushima

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org