[ClusterLabs] Cluster failover failure with Unresolved dependency

Lorand Kelemen lorand.kelemen at gmail.com
Wed Mar 16 10:49:58 UTC 2016


Dear Ken,

Thanks for the reply! I lowered migration-threshold to 1 and rearranged
contraints like you suggested:

Location Constraints:
Ordering Constraints:
  promote mail-clone then start fs-services (kind:Mandatory)
  promote spool-clone then start fs-services (kind:Mandatory)
  start fs-services then start network-services (kind:Mandatory)
  start network-services then start mail-services (kind:Mandatory)
Colocation Constraints:
  fs-services with spool-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
  fs-services with mail-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
  network-services with mail-services (score:INFINITY)
  mail-services with fs-services (score:INFINITY)

Now virtualip and postfix becomes stopped, I guess these are relevant but I
attach also full logs:

Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_color: Resource postfix cannot run anywhere
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_color: Resource virtualip-1 cannot run anywhere

Interesting, will try to play around with ordering - colocation, the
solution must be in these settings...

Best regards,
Lorand

Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: --- 0.215.7 2
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: +++ 0.215.8 (null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=8
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       ++
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']:
 <lrm_rsc_op id="postfix_last_failure_0"
operation_key="postfix_monitor_45000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.10"
transition-key="86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
transition-magic="0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a"
on_node="mail1" call-id="1333" rc-code="7"
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
abort_transition_graph:       Transition aborted by postfix_monitor_45000
'create' on mail1: Inactive graph
(magic=0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.215.8,
source=process_graph_event:598, 1)
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
update_failcount:     Updating failcount for postfix on mail1 after failed
monitor: rc=7 (update=value++, time=1458124686)
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
process_graph_event:  Detected action (2962.86)
postfix_monitor_45000.1333=not running: failed
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_client_update:  Expanded fail-count-postfix=value++ to 1
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/253, version=0.215.8)
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_peer_update:    Setting fail-count-postfix[mail1]: (null) -> 1 from
mail2
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:   notice:
do_state_transition:  State transition S_IDLE -> S_POLICY_ENGINE [
input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
write_attribute:      Sent update 406 with 2 changes for
fail-count-postfix, id=<n/a>, set=(null)
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_peer_update:    Setting last-failure-postfix[mail1]: 1458124291 ->
1458124686 from mail2
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
write_attribute:      Sent update 407 with 2 changes for
last-failure-postfix, id=<n/a>, set=(null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_process_request:  Forwarding cib_modify operation for section status to
master (origin=local/attrd/406)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_process_request:  Forwarding cib_modify operation for section status to
master (origin=local/attrd/407)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: --- 0.215.8 2
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: +++ 0.215.9 (null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=9
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       ++
/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']:
 <nvpair id="status-1-fail-count-postfix" name="fail-count-postfix"
value="1"/>
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=mail2/attrd/406, version=0.215.9)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: --- 0.215.9 2
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: +++ 0.215.10 (null)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=10
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +
 /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:
 @value=1458124686
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_cib_callback:   Update 406 for fail-count-postfix: OK (0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_cib_callback:   Update 406 for fail-count-postfix[mail1]=1: OK (0)
Mar 16 11:38:06 [7415] HWJ-626.domain.local        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=mail2/attrd/407, version=0.215.10)
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_cib_callback:   Update 406 for fail-count-postfix[mail2]=(null): OK
(0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_cib_callback:   Update 407 for last-failure-postfix: OK (0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_cib_callback:   Update 407 for
last-failure-postfix[mail1]=1458124686: OK (0)
Mar 16 11:38:06 [7418] HWJ-626.domain.local      attrd:     info:
attrd_cib_callback:   Update 407 for
last-failure-postfix[mail2]=1457610376: OK (0)
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
abort_transition_graph:       Transition aborted by
status-1-fail-count-postfix, fail-count-postfix=1: Transient attribute
change (create cib=0.215.9, source=abort_unless_down:319,
path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1'],
1)
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
abort_transition_graph:       Transition aborted by
status-1-last-failure-postfix, last-failure-postfix=1458124686: Transient
attribute change (modify cib=0.215.10, source=abort_unless_down:319,
path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],
1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:   notice:
unpack_config:        On loss of CCM Quorum: Ignore
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_online_status:      Node mail1 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_online_status:      Node mail2 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource mail:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource spool:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:  warning:
unpack_rsc_op_failure:        Processing failed op monitor for postfix on
mail1: not running (7)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource spool:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource mail:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
group_print:   Resource Group: network-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      virtualip-1        (ocf::heartbeat:IPaddr2):     Started
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
clone_print:   Master/Slave Set: spool-clone [spool]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
clone_print:   Master/Slave Set: mail-clone [mail]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
group_print:   Resource Group: fs-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      fs-spool   (ocf::heartbeat:Filesystem):    Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      fs-mail    (ocf::heartbeat:Filesystem):    Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
group_print:   Resource Group: mail-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      postfix    (ocf::heartbeat:postfix):       FAILED mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: Promoting mail:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: mail-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: Promoting spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: spool-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
RecurringOp:   Start recurring monitor (45s) for postfix on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   virtualip-1     (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   spool:1 (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   mail:0  (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   mail:1  (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   fs-spool        (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   fs-mail (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:   notice:
LogActions:   Recover postfix (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:   notice:
process_pe_message:   Calculated Transition 2963:
/var/lib/pacemaker/pengine/pe-input-330.bz2
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
handle_response:      pe_calc calculation pe_calc-dc-1458124686-5541 is
obsolete
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:   notice:
unpack_config:        On loss of CCM Quorum: Ignore
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_online_status:      Node mail1 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_online_status:      Node mail2 is online
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource mail:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource spool:0 active in
master mode on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-spool active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource fs-mail active on
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:  warning:
unpack_rsc_op_failure:        Processing failed op monitor for postfix on
mail1: not running (7)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource spool:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
determine_op_status:  Operation monitor found resource mail:1 active in
master mode on mail2
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
group_print:   Resource Group: network-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      virtualip-1        (ocf::heartbeat:IPaddr2):     Started
mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
clone_print:   Master/Slave Set: spool-clone [spool]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
clone_print:   Master/Slave Set: mail-clone [mail]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Masters: [ mail1 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
short_print:       Slaves: [ mail2 ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
group_print:   Resource Group: fs-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      fs-spool   (ocf::heartbeat:Filesystem):    Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      fs-mail    (ocf::heartbeat:Filesystem):    Started mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
group_print:   Resource Group: mail-services
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_print:      postfix    (ocf::heartbeat:postfix):       FAILED mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
get_failcount_full:   postfix has failed 1 times on mail1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:  warning:
common_apply_stickiness:      Forcing postfix away from mail1 after 1
failures (max=1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: Promoting mail:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: mail-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: Promoting spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
master_color: spool-clone: Promoted 1 instances of a possible 1 to master
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
rsc_merge_weights:    fs-mail: Rolling back scores from postfix
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
rsc_merge_weights:    postfix: Rolling back scores from virtualip-1
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_color: Resource postfix cannot run anywhere
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
native_color: Resource virtualip-1 cannot run anywhere
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:   notice:
LogActions:   Stop    virtualip-1     (mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   spool:0 (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   spool:1 (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   mail:0  (Master mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   mail:1  (Slave mail2)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   fs-spool        (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:     info:
LogActions:   Leave   fs-mail (Started mail1)
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:   notice:
LogActions:   Stop    postfix (mail1)
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
do_state_transition:  State transition S_POLICY_ENGINE ->
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=handle_response ]
Mar 16 11:38:06 [7419] HWJ-626.domain.local    pengine:   notice:
process_pe_message:   Calculated Transition 2964:
/var/lib/pacemaker/pengine/pe-input-331.bz2
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:     info:
do_te_invoke: Processing graph 2964 (ref=pe_calc-dc-1458124686-5542)
derived from /var/lib/pacemaker/pengine/pe-input-331.bz2
Mar 16 11:38:06 [7420] HWJ-626.domain.local       crmd:   notice:
te_rsc_command:       Initiating action 5: stop postfix_stop_0 on mail1
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: --- 0.215.10 2
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: +++ 0.215.11 (null)
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=11
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +
 /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
 @operation_key=postfix_stop_0, @operation=stop,
@transition-key=5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1335, @last-run=1458124686, @last-rc-change=1458124686,
@exec-time=435
Mar 16 11:38:07 [7420] HWJ-626.domain.local       crmd:     info:
match_graph_event:    Action postfix_stop_0 (5) confirmed on mail1 (rc=0)
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/254, version=0.215.11)
Mar 16 11:38:07 [7420] HWJ-626.domain.local       crmd:   notice:
te_rsc_command:       Initiating action 12: stop virtualip-1_stop_0 on mail1
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: --- 0.215.11 2
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       Diff: +++ 0.215.12 (null)
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +  /cib:  @num_updates=12
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_perform_op:       +
 /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
 @operation_key=virtualip-1_stop_0, @operation=stop,
@transition-key=12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@transition-magic=0:0;12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
@call-id=1337, @last-run=1458124687, @last-rc-change=1458124687,
@exec-time=56
Mar 16 11:38:07 [7420] HWJ-626.domain.local       crmd:     info:
match_graph_event:    Action virtualip-1_stop_0 (12) confirmed on mail1
(rc=0)
Mar 16 11:38:07 [7415] HWJ-626.domain.local        cib:     info:
cib_process_request:  Completed cib_modify operation for section status: OK
(rc=0, origin=mail1/crmd/255, version=0.215.12)
Mar 16 11:38:07 [7420] HWJ-626.domain.local       crmd:   notice:
run_graph:    Transition 2964 (Complete=7, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-331.bz2): Complete
Mar 16 11:38:07 [7420] HWJ-626.domain.local       crmd:     info: do_log:
    FSA: Input I_TE_SUCCESS from notify_crmd() received in state
S_TRANSITION_ENGINE
Mar 16 11:38:07 [7420] HWJ-626.domain.local       crmd:   notice:
do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 16 11:38:12 [7415] HWJ-626.domain.local        cib:     info:
cib_process_ping:     Reporting our current digest to mail2:
ed43bc3ecf0f15853900ba49fc514870 for 0.215.12 (0x152b110 0)


On Mon, Mar 14, 2016 at 6:44 PM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 03/10/2016 09:49 AM, Lorand Kelemen wrote:
> > Dear List,
> >
> > After the creation and testing of a simple 2 node active-passive
> > drbd+postfix cluster nearly everything works flawlessly (standby, failure
> > of a filesystem resource + failover, splitbrain + manual recovery)
> however
> > when delibarately killing the postfix instance, after reaching the
> > migration threshold failover does not occur and resources revert to the
> > Stopped state (except the master-slave drbd resource, which works as
> > expected).
> >
> > Ordering and colocation is configured, STONITH and quorum disabled, the
> > goal is to always have one node running all the resources and at any sign
> > of error it should fail over to the passive node, nothing fancy.
> >
> > Is my configuration wrong or am I hitting a bug?
> >
> > All software from centos 7 + elrepo repositories.
>
> With these versions, you can set "two_node: 1" in
> /etc/corosync/corosync.conf (which will be done automatically if you
> used "pcs cluster setup" initially), and then you don't need to ignore
> quorum in pacemaker.
>
> > Regarding STONITH: the machines are running on free ESXi instances on
> > separate machines, so the Vmware fencing agents won't work because in the
> > free version the API is read-only.
> > Still trying to figure out a way to go, until then manual recovery + huge
> > arp cache times on the upstream firewall...
> >
> > Please find pe-input*.bz files attached, logs and config below. The
> > situation: on node mail1 postfix was killed 3 times (migration
> threshold),
> > it should have failed over to mail2.
> > When killing a filesystem resource three times this happens flawlessly.
> >
> > Thanks for your input!
> >
> > Best regards,
> > Lorand
> >
> >
> > Cluster Name: mailcluster
> > Corosync Nodes:
> >  mail1 mail2
> > Pacemaker Nodes:
> >  mail1 mail2
> >
> > Resources:
> >  Group: network-services
> >   Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2)
> >    Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0
> >    Operations: start interval=0s timeout=20s
> (virtualip-1-start-interval-0s)
> >                stop interval=0s timeout=20s
> (virtualip-1-stop-interval-0s)
> >                monitor interval=30s (virtualip-1-monitor-interval-30s)
> >  Master: spool-clone
> >   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> > notify=true
> >   Resource: spool (class=ocf provider=linbit type=drbd)
> >    Attributes: drbd_resource=spool
> >    Operations: start interval=0s timeout=240 (spool-start-interval-0s)
> >                promote interval=0s timeout=90 (spool-promote-interval-0s)
> >                demote interval=0s timeout=90 (spool-demote-interval-0s)
> >                stop interval=0s timeout=100 (spool-stop-interval-0s)
> >                monitor interval=10s (spool-monitor-interval-10s)
> >  Master: mail-clone
> >   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> > notify=true
> >   Resource: mail (class=ocf provider=linbit type=drbd)
> >    Attributes: drbd_resource=mail
> >    Operations: start interval=0s timeout=240 (mail-start-interval-0s)
> >                promote interval=0s timeout=90 (mail-promote-interval-0s)
> >                demote interval=0s timeout=90 (mail-demote-interval-0s)
> >                stop interval=0s timeout=100 (mail-stop-interval-0s)
> >                monitor interval=10s (mail-monitor-interval-10s)
> >  Group: fs-services
> >   Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem)
> >    Attributes: device=/dev/drbd0 directory=/var/spool/postfix fstype=ext4
> > options=nodev,nosuid,noexec
> >    Operations: start interval=0s timeout=60 (fs-spool-start-interval-0s)
> >                stop interval=0s timeout=60 (fs-spool-stop-interval-0s)
> >                monitor interval=20 timeout=40
> (fs-spool-monitor-interval-20)
> >   Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem)
> >    Attributes: device=/dev/drbd1 directory=/var/spool/mail fstype=ext4
> > options=nodev,nosuid,noexec
> >    Operations: start interval=0s timeout=60 (fs-mail-start-interval-0s)
> >                stop interval=0s timeout=60 (fs-mail-stop-interval-0s)
> >                monitor interval=20 timeout=40
> (fs-mail-monitor-interval-20)
> >  Group: mail-services
> >   Resource: postfix (class=ocf provider=heartbeat type=postfix)
> >    Operations: start interval=0s timeout=20s (postfix-start-interval-0s)
> >                stop interval=0s timeout=20s (postfix-stop-interval-0s)
> >                monitor interval=45s (postfix-monitor-interval-45s)
> >
> > Stonith Devices:
> > Fencing Levels:
> >
> > Location Constraints:
> > Ordering Constraints:
> >   start network-services then promote mail-clone (kind:Mandatory)
> > (id:order-network-services-mail-clone-mandatory)
> >   promote mail-clone then promote spool-clone (kind:Mandatory)
> > (id:order-mail-clone-spool-clone-mandatory)
> >   promote spool-clone then start fs-services (kind:Mandatory)
> > (id:order-spool-clone-fs-services-mandatory)
> >   start fs-services then start mail-services (kind:Mandatory)
> > (id:order-fs-services-mail-services-mandatory)
> > Colocation Constraints:
> >   network-services with spool-clone (score:INFINITY) (rsc-role:Started)
> > (with-rsc-role:Master)
> (id:colocation-network-services-spool-clone-INFINITY)
> >   network-services with mail-clone (score:INFINITY) (rsc-role:Started)
> > (with-rsc-role:Master)
> (id:colocation-network-services-mail-clone-INFINITY)
> >   network-services with fs-services (score:INFINITY)
> > (id:colocation-network-services-fs-services-INFINITY)
> >   network-services with mail-services (score:INFINITY)
> > (id:colocation-network-services-mail-services-INFINITY)
>
> I'm not sure whether it's causing your issue, but I would make the
> constraints reflect the logical relationships better.
>
> For example, network-services only needs to be colocated with
> mail-services logically; it's mail-services that needs to be with
> fs-services, and fs-services that needs to be with
> spool-clone/mail-clone master. In other words, don't make the
> highest-level resource depend on everything else, make each level depend
> on the level below it.
>
> Also, I would guess that the virtual IP only needs to be ordered before
> mail-services, and mail-clone and spool-clone could both be ordered
> before fs-services, rather than ordering mail-clone before spool-clone.
>
> > Resources Defaults:
> >  migration-threshold: 3
> > Operations Defaults:
> >  on-fail: restart
> >
> > Cluster Properties:
> >  cluster-infrastructure: corosync
> >  cluster-name: mailcluster
> >  cluster-recheck-interval: 5min
> >  dc-version: 1.1.13-10.el7_2.2-44eb2dd
> >  default-resource-stickiness: infinity
> >  have-watchdog: false
> >  last-lrm-refresh: 1457613674
> >  no-quorum-policy: ignore
> >  pe-error-series-max: 1024
> >  pe-input-series-max: 1024
> >  pe-warn-series-max: 1024
> >  stonith-enabled: false
> >
> >
> >
> >
> >
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: --- 0.197.15 2
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: +++ 0.197.16 (null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=16
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_failure_0']:
> >  @transition-key=4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1274, @last-rc-change=1457613440
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local       crmd:     info:
> > abort_transition_graph:       Transition aborted by postfix_monitor_45000
> > 'modify' on mail1: Inactive graph
> > (magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.197.16,
> > source=process_graph_event:598, 1)
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local       crmd:     info:
> > update_failcount:     Updating failcount for postfix on mail1 after
> failed
> > monitor: rc=7 (update=value++, time=1457613440)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_client_update:  Expanded fail-count-postfix=value++ to 3
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/196, version=0.197.16)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_peer_update:    Setting fail-count-postfix[mail1]: 2 -> 3 from
> mail2
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > write_attribute:      Sent update 400 with 2 changes for
> > fail-count-postfix, id=<n/a>, set=(null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Forwarding cib_modify operation for section status
> to
> > master (origin=local/attrd/400)
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local       crmd:     info:
> > process_graph_event:  Detected action (1234.4)
> > postfix_monitor_45000.1274=not running: failed
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_peer_update:    Setting last-failure-postfix[mail1]: 1457613347 ->
> > 1457613440 from mail2
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local       crmd:   notice:
> > do_state_transition:  State transition S_IDLE -> S_POLICY_ENGINE [
> > input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > write_attribute:      Sent update 401 with 2 changes for
> > last-failure-postfix, id=<n/a>, set=(null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: --- 0.197.16 2
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: +++ 0.197.17 (null)
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=17
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +
> >
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix']:
> >  @value=3
> > Mar 10 13:37:20 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail2/attrd/400, version=0.197.17)
> > Mar 10 13:37:20 [7420] HWJ-626.domain.local       crmd:     info:
> > abort_transition_graph:       Transition aborted by
> > status-1-fail-count-postfix, fail-count-postfix=3: Transient attribute
> > change (modify cib=0.197.17, source=abort_unless_down:319,
> >
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix'],
> > 1)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_cib_callback:   Update 400 for fail-count-postfix: OK (0)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_cib_callback:   Update 400 for fail-count-postfix[mail1]=3: OK (0)
> > Mar 10 13:37:20 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_cib_callback:   Update 400 for fail-count-postfix[mail2]=(null): OK
> > (0)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Forwarding cib_modify operation for section status
> to
> > master (origin=local/attrd/401)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: --- 0.197.17 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: +++ 0.197.18 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=18
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +
> >
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:
> >  @value=1457613440
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail2/attrd/401, version=0.197.18)
> > Mar 10 13:37:21 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_cib_callback:   Update 401 for last-failure-postfix: OK (0)
> > Mar 10 13:37:21 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_cib_callback:   Update 401 for
> > last-failure-postfix[mail1]=1457613440: OK (0)
> > Mar 10 13:37:21 [7418] HWJ-626.domain.local      attrd:     info:
> > attrd_cib_callback:   Update 401 for
> > last-failure-postfix[mail2]=1457610376: OK (0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > abort_transition_graph:       Transition aborted by
> > status-1-last-failure-postfix, last-failure-postfix=1457613440: Transient
> > attribute change (modify cib=0.197.18, source=abort_unless_down:319,
> >
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],
> > 1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > unpack_config:        On loss of CCM Quorum: Ignore
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_online_status:      Node mail1 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_online_status:      Node mail2 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource mail:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource spool:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource fs-spool active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource fs-mail active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:  warning:
> > unpack_rsc_op_failure:        Processing failed op monitor for postfix on
> > mail1: not running (7)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource spool:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource mail:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > group_print:   Resource Group: network-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      virtualip-1        (ocf::heartbeat:IPaddr2):
>  Started
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > clone_print:   Master/Slave Set: spool-clone [spool]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > clone_print:   Master/Slave Set: mail-clone [mail]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > group_print:   Resource Group: fs-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      fs-spool   (ocf::heartbeat:Filesystem):    Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      fs-mail    (ocf::heartbeat:Filesystem):    Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > group_print:   Resource Group: mail-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      postfix    (ocf::heartbeat:postfix):       FAILED
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > get_failcount_full:   postfix has failed 3 times on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:  warning:
> > common_apply_stickiness:      Forcing postfix away from mail1 after 3
> > failures (max=3)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: Promoting mail:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: mail-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: Promoting spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: spool-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > rsc_merge_weights:    postfix: Rolling back scores from virtualip-1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_color: Resource virtualip-1 cannot run anywhere
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > RecurringOp:   Start recurring monitor (45s) for postfix on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    virtualip-1     (mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   spool:1 (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   mail:0  (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   mail:1  (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    fs-spool        (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    fs-mail (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    postfix (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > process_pe_message:   Calculated Transition 1235:
> > /var/lib/pacemaker/pengine/pe-input-302.bz2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > handle_response:      pe_calc calculation pe_calc-dc-1457613441-3756 is
> > obsolete
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > unpack_config:        On loss of CCM Quorum: Ignore
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_online_status:      Node mail1 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_online_status:      Node mail2 is online
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource mail:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource spool:0 active in
> > master mode on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource fs-spool active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource fs-mail active on
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:  warning:
> > unpack_rsc_op_failure:        Processing failed op monitor for postfix on
> > mail1: not running (7)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource spool:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > determine_op_status:  Operation monitor found resource mail:1 active in
> > master mode on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > group_print:   Resource Group: network-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      virtualip-1        (ocf::heartbeat:IPaddr2):
>  Started
> > mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > clone_print:   Master/Slave Set: spool-clone [spool]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > clone_print:   Master/Slave Set: mail-clone [mail]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Masters: [ mail1 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > short_print:       Slaves: [ mail2 ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > group_print:   Resource Group: fs-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      fs-spool   (ocf::heartbeat:Filesystem):    Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      fs-mail    (ocf::heartbeat:Filesystem):    Started
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > group_print:   Resource Group: mail-services
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_print:      postfix    (ocf::heartbeat:postfix):       FAILED
> mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > get_failcount_full:   postfix has failed 3 times on mail1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:  warning:
> > common_apply_stickiness:      Forcing postfix away from mail1 after 3
> > failures (max=3)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: Promoting mail:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: mail-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: Promoting spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > master_color: spool-clone: Promoted 1 instances of a possible 1 to master
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > rsc_merge_weights:    postfix: Rolling back scores from virtualip-1
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > native_color: Resource virtualip-1 cannot run anywhere
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > RecurringOp:   Start recurring monitor (45s) for postfix on mail2
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    virtualip-1     (mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   spool:0 (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   spool:1 (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   mail:0  (Master mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:     info:
> > LogActions:   Leave   mail:1  (Slave mail2)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    fs-spool        (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    fs-mail (Started mail1)
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > LogActions:   Stop    postfix (Started mail1)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > do_state_transition:  State transition S_POLICY_ENGINE ->
> > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> > origin=handle_response ]
> > Mar 10 13:37:21 [7419] HWJ-626.domain.local    pengine:   notice:
> > process_pe_message:   Calculated Transition 1236:
> > /var/lib/pacemaker/pengine/pe-input-303.bz2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > do_te_invoke: Processing graph 1236 (ref=pe_calc-dc-1457613441-3757)
> > derived from /var/lib/pacemaker/pengine/pe-input-303.bz2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > te_rsc_command:       Initiating action 12: stop virtualip-1_stop_0 on
> mail1
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > te_rsc_command:       Initiating action 5: stop postfix_stop_0 on mail1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: --- 0.197.18 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: +++ 0.197.19 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=19
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:
> >  @operation_key=virtualip-1_stop_0, @operation=stop,
> > @transition-key=12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1276, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=66
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/197, version=0.197.19)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > match_graph_event:    Action virtualip-1_stop_0 (12) confirmed on mail1
> > (rc=0)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: --- 0.197.19 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: +++ 0.197.20 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=20
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:
> >  @operation_key=postfix_stop_0, @operation=stop,
> > @transition-key=5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1278, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=476
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > match_graph_event:    Action postfix_stop_0 (5) confirmed on mail1 (rc=0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > te_rsc_command:       Initiating action 79: stop fs-mail_stop_0 on mail1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/198, version=0.197.20)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: --- 0.197.20 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: +++ 0.197.21 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=21
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-mail']/lrm_rsc_op[@id='fs-mail_last_0']:
> >  @operation_key=fs-mail_stop_0, @operation=stop,
> > @transition-key=79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1280, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=88, @queue-time=1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/199, version=0.197.21)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > match_graph_event:    Action fs-mail_stop_0 (79) confirmed on mail1
> (rc=0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > te_rsc_command:       Initiating action 77: stop fs-spool_stop_0 on mail1
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: --- 0.197.21 2
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       Diff: +++ 0.197.22 (null)
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=22
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_perform_op:       +
> >
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-spool']/lrm_rsc_op[@id='fs-spool_last_0']:
> >  @operation_key=fs-spool_stop_0, @operation=stop,
> > @transition-key=77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @transition-magic=0:0;77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,
> > @call-id=1282, @last-run=1457613441, @last-rc-change=1457613441,
> > @exec-time=86
> > Mar 10 13:37:21 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_request:  Completed cib_modify operation for section status:
> OK
> > (rc=0, origin=mail1/crmd/200, version=0.197.22)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info:
> > match_graph_event:    Action fs-spool_stop_0 (77) confirmed on mail1
> (rc=0)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:  warning:
> > run_graph:    Transition 1236 (Complete=11, Pending=0, Fired=0,
> Skipped=0,
> > Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-303.bz2):
> > Terminated
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:  warning:
> > te_graph_trigger:     Transition failed: terminated
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_graph:  Graph 1236 with 12 actions: batch-limit=12 jobs,
> > network-delay=0ms
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   16]: Completed pseudo op
> > network-services_stopped_0     on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   15]: Completed pseudo op
> > network-services_stop_0        on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   12]: Completed rsc op virtualip-1_stop_0
> >              on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   84]: Completed pseudo op
> > fs-services_stopped_0          on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   83]: Completed pseudo op
> fs-services_stop_0
> >             on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   77]: Completed rsc op fs-spool_stop_0
> >             on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   79]: Completed rsc op fs-mail_stop_0
> >              on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   90]: Completed pseudo op
> > mail-services_stopped_0        on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   89]: Completed pseudo op
> > mail-services_stop_0           on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action   86]: Pending rsc op postfix_monitor_45000
> >             on mail2 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:         * [Input 85]: Unresolved dependency rsc op
> > postfix_start_0 on mail2
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action    5]: Completed rsc op postfix_stop_0
> >              on mail1 (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > print_synapse:        [Action    8]: Completed pseudo op all_stopped
> >              on N/A (priority: 0, waiting: none)
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:     info: do_log:
> >     FSA: Input I_TE_SUCCESS from notify_crmd() received in state
> > S_TRANSITION_ENGINE
> > Mar 10 13:37:21 [7420] HWJ-626.domain.local       crmd:   notice:
> > do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE [
> > input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Mar 10 13:37:26 [7415] HWJ-626.domain.local        cib:     info:
> > cib_process_ping:     Reporting our current digest to mail2:
> > 3896ee29cdb6ba128330b0ef6e41bd79 for 0.197.22 (0x1544a30 0)
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160316/423c4da6/attachment-0002.html>


More information about the Users mailing list