<div dir="ltr">Dear Ken,<div><br></div><div>Thanks for the reply! I lowered migration-threshold to 1 and rearranged contraints like you suggested:</div><div><div><br></div><div>Location Constraints:</div><div>Ordering Constraints:</div><div> promote mail-clone then start fs-services (kind:Mandatory)</div><div> promote spool-clone then start fs-services (kind:Mandatory)</div><div> start fs-services then start network-services (kind:Mandatory)</div><div> start network-services then start mail-services (kind:Mandatory)</div><div>Colocation Constraints:</div><div> fs-services with spool-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)</div><div> fs-services with mail-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)</div><div> network-services with mail-services (score:INFINITY)</div><div> mail-services with fs-services (score:INFINITY)</div></div><div><br></div><div>Now virtualip and postfix becomes stopped, I guess these are relevant but I attach also full logs:</div><div><br></div><div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource postfix cannot run anywhere</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource virtualip-1 cannot run anywhere</div></div><div><br></div><div>Interesting, will try to play around with ordering - colocation, the solution must be in these settings...</div><div><br></div><div>Best regards,</div><div>Lorand</div><div><br></div><div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.7 2</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.8 (null)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=8</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']: <lrm_rsc_op id="postfix_last_failure_0" operation_key="postfix_monitor_45000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a" transition-magic="0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a" on_node="mail1" call-id="1333" rc-code="7"</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: abort_transition_graph: Transition aborted by postfix_monitor_45000 'create' on mail1: Inactive graph (magic=0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.215.8, source=process_graph_event:598, 1)</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: update_failcount: Updating failcount for postfix on mail1 after failed monitor: rc=7 (update=value++, time=1458124686)</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: process_graph_event: Detected action (2962.86) postfix_monitor_45000.1333=not running: failed</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_client_update: Expanded fail-count-postfix=value++ to 1</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail1/crmd/253, version=0.215.8)</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_peer_update: Setting fail-count-postfix[mail1]: (null) -> 1 from mail2</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: write_attribute: Sent update 406 with 2 changes for fail-count-postfix, id=<n/a>, set=(null)</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_peer_update: Setting last-failure-postfix[mail1]: 1458124291 -> 1458124686 from mail2</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: write_attribute: Sent update 407 with 2 changes for last-failure-postfix, id=<n/a>, set=(null)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/406)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/407)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.8 2</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.9 (null)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=9</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']: <nvpair id="status-1-fail-count-postfix" name="fail-count-postfix" value="1"/></div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail2/attrd/406, version=0.215.9)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.9 2</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.10 (null)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=10</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']: @value=1458124686</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 406 for fail-count-postfix: OK (0)</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 406 for fail-count-postfix[mail1]=1: OK (0)</div><div>Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail2/attrd/407, version=0.215.10)</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 406 for fail-count-postfix[mail2]=(null): OK (0)</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 407 for last-failure-postfix: OK (0)</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 407 for last-failure-postfix[mail1]=1458124686: OK (0)</div><div>Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 407 for last-failure-postfix[mail2]=1457610376: OK (0)</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: abort_transition_graph: Transition aborted by status-1-fail-count-postfix, fail-count-postfix=1: Transient attribute change (create cib=0.215.9, source=abort_unless_down:319, path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1'], 1)</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: abort_transition_graph: Transition aborted by status-1-last-failure-postfix, last-failure-postfix=1458124686: Transient attribute change (modify cib=0.215.10, source=abort_unless_down:319, path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'], 1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: unpack_config: On loss of CCM Quorum: Ignore</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail1 is online</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail2 is online</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:0 active in master mode on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:0 active in master mode on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning: unpack_rsc_op_failure: Processing failed op monitor for postfix on mail1: not running (7)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:1 active in master mode on mail2</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:1 active in master mode on mail2</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: network-services</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: spool-clone [spool]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: mail-clone [mail]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: fs-services</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: mail-services</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: postfix (ocf::heartbeat:postfix): FAILED mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting mail:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: mail-clone: Promoted 1 instances of a possible 1 to master</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting spool:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: spool-clone: Promoted 1 instances of a possible 1 to master</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: RecurringOp: Start recurring monitor (45s) for postfix on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave virtualip-1 (Started mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:1 (Slave mail2)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:1 (Slave mail2)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-spool (Started mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-mail (Started mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: LogActions: Recover postfix (Started mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: process_pe_message: Calculated Transition 2963: /var/lib/pacemaker/pengine/pe-input-330.bz2</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: handle_response: pe_calc calculation pe_calc-dc-1458124686-5541 is obsolete</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: unpack_config: On loss of CCM Quorum: Ignore</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail1 is online</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail2 is online</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:0 active in master mode on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:0 active in master mode on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning: unpack_rsc_op_failure: Processing failed op monitor for postfix on mail1: not running (7)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:1 active in master mode on mail2</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:1 active in master mode on mail2</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: network-services</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: spool-clone [spool]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: mail-clone [mail]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: fs-services</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: mail-services</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: postfix (ocf::heartbeat:postfix): FAILED mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: get_failcount_full: postfix has failed 1 times on mail1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning: common_apply_stickiness: Forcing postfix away from mail1 after 1 failures (max=1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting mail:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: mail-clone: Promoted 1 instances of a possible 1 to master</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting spool:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: spool-clone: Promoted 1 instances of a possible 1 to master</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: rsc_merge_weights: fs-mail: Rolling back scores from postfix</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: rsc_merge_weights: postfix: Rolling back scores from virtualip-1</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource postfix cannot run anywhere</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource virtualip-1 cannot run anywhere</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: LogActions: Stop virtualip-1 (mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:1 (Slave mail2)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:0 (Master mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:1 (Slave mail2)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-spool (Started mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-mail (Started mail1)</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: LogActions: Stop postfix (mail1)</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]</div><div>Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: process_pe_message: Calculated Transition 2964: /var/lib/pacemaker/pengine/pe-input-331.bz2</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: do_te_invoke: Processing graph 2964 (ref=pe_calc-dc-1458124686-5542) derived from /var/lib/pacemaker/pengine/pe-input-331.bz2</div><div>Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice: te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.10 2</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.11 (null)</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=11</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']: @operation_key=postfix_stop_0, @operation=stop, @transition-key=5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @transition-magic=0:0;5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @call-id=1335, @last-run=1458124686, @last-rc-change=1458124686, @exec-time=435</div><div>Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info: match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0)</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail1/crmd/254, version=0.215.11)</div><div>Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice: te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on mail1</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.11 2</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.12 (null)</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=12</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']: @operation_key=virtualip-1_stop_0, @operation=stop, @transition-key=12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @transition-magic=0:0;12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @call-id=1337, @last-run=1458124687, @last-rc-change=1458124687, @exec-time=56</div><div>Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info: match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1 (rc=0)</div><div>Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail1/crmd/255, version=0.215.12)</div><div>Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice: run_graph: Transition 2964 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-331.bz2): Complete</div><div>Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE</div><div>Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]</div><div>Mar 16 11:38:12 [7415] HWJ-626.domain.local cib: info: cib_process_ping: Reporting our current digest to mail2: ed43bc3ecf0f15853900ba49fc514870 for 0.215.12 (0x152b110 0)</div><div><br></div><div><br></div></div><div class="gmail_extra"><div class="gmail_quote">On Mon, Mar 14, 2016 at 6:44 PM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">On 03/10/2016 09:49 AM, Lorand Kelemen wrote:<br>
> Dear List,<br>
><br>
> After the creation and testing of a simple 2 node active-passive<br>
> drbd+postfix cluster nearly everything works flawlessly (standby, failure<br>
> of a filesystem resource + failover, splitbrain + manual recovery) however<br>
> when delibarately killing the postfix instance, after reaching the<br>
> migration threshold failover does not occur and resources revert to the<br>
> Stopped state (except the master-slave drbd resource, which works as<br>
> expected).<br>
><br>
> Ordering and colocation is configured, STONITH and quorum disabled, the<br>
> goal is to always have one node running all the resources and at any sign<br>
> of error it should fail over to the passive node, nothing fancy.<br>
><br>
> Is my configuration wrong or am I hitting a bug?<br>
><br>
> All software from centos 7 + elrepo repositories.<br>
<br>
With these versions, you can set "two_node: 1" in<br>
/etc/corosync/corosync.conf (which will be done automatically if you<br>
used "pcs cluster setup" initially), and then you don't need to ignore<br>
quorum in pacemaker.<br>
<br>
> Regarding STONITH: the machines are running on free ESXi instances on<br>
> separate machines, so the Vmware fencing agents won't work because in the<br>
> free version the API is read-only.<br>
> Still trying to figure out a way to go, until then manual recovery + huge<br>
> arp cache times on the upstream firewall...<br>
><br>
> Please find pe-input*.bz files attached, logs and config below. The<br>
> situation: on node mail1 postfix was killed 3 times (migration threshold),<br>
> it should have failed over to mail2.<br>
> When killing a filesystem resource three times this happens flawlessly.<br>
><br>
> Thanks for your input!<br>
><br>
> Best regards,<br>
> Lorand<br>
><br>
><br>
> Cluster Name: mailcluster<br>
> Corosync Nodes:<br>
> mail1 mail2<br>
> Pacemaker Nodes:<br>
> mail1 mail2<br>
><br>
> Resources:<br>
> Group: network-services<br>
> Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2)<br>
> Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0<br>
> Operations: start interval=0s timeout=20s (virtualip-1-start-interval-0s)<br>
> stop interval=0s timeout=20s (virtualip-1-stop-interval-0s)<br>
> monitor interval=30s (virtualip-1-monitor-interval-30s)<br>
> Master: spool-clone<br>
> Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1<br>
> notify=true<br>
> Resource: spool (class=ocf provider=linbit type=drbd)<br>
> Attributes: drbd_resource=spool<br>
> Operations: start interval=0s timeout=240 (spool-start-interval-0s)<br>
> promote interval=0s timeout=90 (spool-promote-interval-0s)<br>
> demote interval=0s timeout=90 (spool-demote-interval-0s)<br>
> stop interval=0s timeout=100 (spool-stop-interval-0s)<br>
> monitor interval=10s (spool-monitor-interval-10s)<br>
> Master: mail-clone<br>
> Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1<br>
> notify=true<br>
> Resource: mail (class=ocf provider=linbit type=drbd)<br>
> Attributes: drbd_resource=mail<br>
> Operations: start interval=0s timeout=240 (mail-start-interval-0s)<br>
> promote interval=0s timeout=90 (mail-promote-interval-0s)<br>
> demote interval=0s timeout=90 (mail-demote-interval-0s)<br>
> stop interval=0s timeout=100 (mail-stop-interval-0s)<br>
> monitor interval=10s (mail-monitor-interval-10s)<br>
> Group: fs-services<br>
> Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem)<br>
> Attributes: device=/dev/drbd0 directory=/var/spool/postfix fstype=ext4<br>
> options=nodev,nosuid,noexec<br>
> Operations: start interval=0s timeout=60 (fs-spool-start-interval-0s)<br>
> stop interval=0s timeout=60 (fs-spool-stop-interval-0s)<br>
> monitor interval=20 timeout=40 (fs-spool-monitor-interval-20)<br>
> Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem)<br>
> Attributes: device=/dev/drbd1 directory=/var/spool/mail fstype=ext4<br>
> options=nodev,nosuid,noexec<br>
> Operations: start interval=0s timeout=60 (fs-mail-start-interval-0s)<br>
> stop interval=0s timeout=60 (fs-mail-stop-interval-0s)<br>
> monitor interval=20 timeout=40 (fs-mail-monitor-interval-20)<br>
> Group: mail-services<br>
> Resource: postfix (class=ocf provider=heartbeat type=postfix)<br>
> Operations: start interval=0s timeout=20s (postfix-start-interval-0s)<br>
> stop interval=0s timeout=20s (postfix-stop-interval-0s)<br>
> monitor interval=45s (postfix-monitor-interval-45s)<br>
><br>
> Stonith Devices:<br>
> Fencing Levels:<br>
><br>
> Location Constraints:<br>
> Ordering Constraints:<br>
> start network-services then promote mail-clone (kind:Mandatory)<br>
> (id:order-network-services-mail-clone-mandatory)<br>
> promote mail-clone then promote spool-clone (kind:Mandatory)<br>
> (id:order-mail-clone-spool-clone-mandatory)<br>
> promote spool-clone then start fs-services (kind:Mandatory)<br>
> (id:order-spool-clone-fs-services-mandatory)<br>
> start fs-services then start mail-services (kind:Mandatory)<br>
> (id:order-fs-services-mail-services-mandatory)<br>
> Colocation Constraints:<br>
> network-services with spool-clone (score:INFINITY) (rsc-role:Started)<br>
> (with-rsc-role:Master) (id:colocation-network-services-spool-clone-INFINITY)<br>
> network-services with mail-clone (score:INFINITY) (rsc-role:Started)<br>
> (with-rsc-role:Master) (id:colocation-network-services-mail-clone-INFINITY)<br>
> network-services with fs-services (score:INFINITY)<br>
> (id:colocation-network-services-fs-services-INFINITY)<br>
> network-services with mail-services (score:INFINITY)<br>
> (id:colocation-network-services-mail-services-INFINITY)<br>
<br>
I'm not sure whether it's causing your issue, but I would make the<br>
constraints reflect the logical relationships better.<br>
<br>
For example, network-services only needs to be colocated with<br>
mail-services logically; it's mail-services that needs to be with<br>
fs-services, and fs-services that needs to be with<br>
spool-clone/mail-clone master. In other words, don't make the<br>
highest-level resource depend on everything else, make each level depend<br>
on the level below it.<br>
<br>
Also, I would guess that the virtual IP only needs to be ordered before<br>
mail-services, and mail-clone and spool-clone could both be ordered<br>
before fs-services, rather than ordering mail-clone before spool-clone.<br>
<br>
> Resources Defaults:<br>
> migration-threshold: 3<br>
> Operations Defaults:<br>
> on-fail: restart<br>
><br>
> Cluster Properties:<br>
> cluster-infrastructure: corosync<br>
> cluster-name: mailcluster<br>
> cluster-recheck-interval: 5min<br>
> dc-version: 1.1.13-10.el7_2.2-44eb2dd<br>
> default-resource-stickiness: infinity<br>
> have-watchdog: false<br>
> last-lrm-refresh: 1457613674<br>
> no-quorum-policy: ignore<br>
> pe-error-series-max: 1024<br>
> pe-input-series-max: 1024<br>
> pe-warn-series-max: 1024<br>
> stonith-enabled: false<br>
><br>
><br>
><br>
><br>
><br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: --- 0.197.15 2<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: +++ 0.197.16 (null)<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: + /cib: @num_updates=16<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: +<br>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_failure_0']:<br>
> @transition-key=4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @transition-magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @call-id=1274, @last-rc-change=1457613440<br>
> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:<br>
> abort_transition_graph: Transition aborted by postfix_monitor_45000<br>
> 'modify' on mail1: Inactive graph<br>
> (magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.197.16,<br>
> source=process_graph_event:598, 1)<br>
> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:<br>
> update_failcount: Updating failcount for postfix on mail1 after failed<br>
> monitor: rc=7 (update=value++, time=1457613440)<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_client_update: Expanded fail-count-postfix=value++ to 3<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Completed cib_modify operation for section status: OK<br>
> (rc=0, origin=mail1/crmd/196, version=0.197.16)<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_peer_update: Setting fail-count-postfix[mail1]: 2 -> 3 from mail2<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> write_attribute: Sent update 400 with 2 changes for<br>
> fail-count-postfix, id=<n/a>, set=(null)<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Forwarding cib_modify operation for section status to<br>
> master (origin=local/attrd/400)<br>
> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:<br>
> process_graph_event: Detected action (1234.4)<br>
> postfix_monitor_45000.1274=not running: failed<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_peer_update: Setting last-failure-postfix[mail1]: 1457613347 -><br>
> 1457613440 from mail2<br>
> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: notice:<br>
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [<br>
> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> write_attribute: Sent update 401 with 2 changes for<br>
> last-failure-postfix, id=<n/a>, set=(null)<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: --- 0.197.16 2<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: +++ 0.197.17 (null)<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: + /cib: @num_updates=17<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: +<br>
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix']:<br>
> @value=3<br>
> Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Completed cib_modify operation for section status: OK<br>
> (rc=0, origin=mail2/attrd/400, version=0.197.17)<br>
> Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info:<br>
> abort_transition_graph: Transition aborted by<br>
> status-1-fail-count-postfix, fail-count-postfix=3: Transient attribute<br>
> change (modify cib=0.197.17, source=abort_unless_down:319,<br>
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix'],<br>
> 1)<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_cib_callback: Update 400 for fail-count-postfix: OK (0)<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_cib_callback: Update 400 for fail-count-postfix[mail1]=3: OK (0)<br>
> Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_cib_callback: Update 400 for fail-count-postfix[mail2]=(null): OK<br>
> (0)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Forwarding cib_modify operation for section status to<br>
> master (origin=local/attrd/401)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: --- 0.197.17 2<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: +++ 0.197.18 (null)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: + /cib: @num_updates=18<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: +<br>
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']:<br>
> @value=1457613440<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Completed cib_modify operation for section status: OK<br>
> (rc=0, origin=mail2/attrd/401, version=0.197.18)<br>
> Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_cib_callback: Update 401 for last-failure-postfix: OK (0)<br>
> Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_cib_callback: Update 401 for<br>
> last-failure-postfix[mail1]=1457613440: OK (0)<br>
> Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info:<br>
> attrd_cib_callback: Update 401 for<br>
> last-failure-postfix[mail2]=1457610376: OK (0)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> abort_transition_graph: Transition aborted by<br>
> status-1-last-failure-postfix, last-failure-postfix=1457613440: Transient<br>
> attribute change (modify cib=0.197.18, source=abort_unless_down:319,<br>
> path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'],<br>
> 1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> unpack_config: On loss of CCM Quorum: Ignore<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_online_status: Node mail1 is online<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_online_status: Node mail2 is online<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource mail:0 active in<br>
> master mode on mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource spool:0 active in<br>
> master mode on mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource fs-spool active on<br>
> mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource fs-mail active on<br>
> mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for postfix on<br>
> mail1: not running (7)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource spool:1 active in<br>
> master mode on mail2<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource mail:1 active in<br>
> master mode on mail2<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> group_print: Resource Group: network-services<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started<br>
> mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> clone_print: Master/Slave Set: spool-clone [spool]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Masters: [ mail1 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Slaves: [ mail2 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> clone_print: Master/Slave Set: mail-clone [mail]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Masters: [ mail1 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Slaves: [ mail2 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> group_print: Resource Group: fs-services<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> group_print: Resource Group: mail-services<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: postfix (ocf::heartbeat:postfix): FAILED mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> get_failcount_full: postfix has failed 3 times on mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:<br>
> common_apply_stickiness: Forcing postfix away from mail1 after 3<br>
> failures (max=3)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: Promoting mail:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: mail-clone: Promoted 1 instances of a possible 1 to master<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: Promoting spool:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: spool-clone: Promoted 1 instances of a possible 1 to master<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> rsc_merge_weights: postfix: Rolling back scores from virtualip-1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_color: Resource virtualip-1 cannot run anywhere<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> RecurringOp: Start recurring monitor (45s) for postfix on mail2<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop virtualip-1 (mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave spool:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave spool:1 (Slave mail2)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave mail:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave mail:1 (Slave mail2)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop fs-spool (Started mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop fs-mail (Started mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop postfix (Started mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> process_pe_message: Calculated Transition 1235:<br>
> /var/lib/pacemaker/pengine/pe-input-302.bz2<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> handle_response: pe_calc calculation pe_calc-dc-1457613441-3756 is<br>
> obsolete<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> unpack_config: On loss of CCM Quorum: Ignore<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_online_status: Node mail1 is online<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_online_status: Node mail2 is online<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource mail:0 active in<br>
> master mode on mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource spool:0 active in<br>
> master mode on mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource fs-spool active on<br>
> mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource fs-mail active on<br>
> mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:<br>
> unpack_rsc_op_failure: Processing failed op monitor for postfix on<br>
> mail1: not running (7)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource spool:1 active in<br>
> master mode on mail2<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> determine_op_status: Operation monitor found resource mail:1 active in<br>
> master mode on mail2<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> group_print: Resource Group: network-services<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started<br>
> mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> clone_print: Master/Slave Set: spool-clone [spool]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Masters: [ mail1 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Slaves: [ mail2 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> clone_print: Master/Slave Set: mail-clone [mail]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Masters: [ mail1 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> short_print: Slaves: [ mail2 ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> group_print: Resource Group: fs-services<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> group_print: Resource Group: mail-services<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_print: postfix (ocf::heartbeat:postfix): FAILED mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> get_failcount_full: postfix has failed 3 times on mail1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning:<br>
> common_apply_stickiness: Forcing postfix away from mail1 after 3<br>
> failures (max=3)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: Promoting mail:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: mail-clone: Promoted 1 instances of a possible 1 to master<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: Promoting spool:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> master_color: spool-clone: Promoted 1 instances of a possible 1 to master<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> rsc_merge_weights: postfix: Rolling back scores from virtualip-1<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> native_color: Resource virtualip-1 cannot run anywhere<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> RecurringOp: Start recurring monitor (45s) for postfix on mail2<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop virtualip-1 (mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave spool:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave spool:1 (Slave mail2)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave mail:0 (Master mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info:<br>
> LogActions: Leave mail:1 (Slave mail2)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop fs-spool (Started mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop fs-mail (Started mail1)<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> LogActions: Stop postfix (Started mail1)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> do_state_transition: State transition S_POLICY_ENGINE -><br>
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE<br>
> origin=handle_response ]<br>
> Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice:<br>
> process_pe_message: Calculated Transition 1236:<br>
> /var/lib/pacemaker/pengine/pe-input-303.bz2<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> do_te_invoke: Processing graph 1236 (ref=pe_calc-dc-1457613441-3757)<br>
> derived from /var/lib/pacemaker/pengine/pe-input-303.bz2<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on mail1<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: --- 0.197.18 2<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: +++ 0.197.19 (null)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: + /cib: @num_updates=19<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: +<br>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']:<br>
> @operation_key=virtualip-1_stop_0, @operation=stop,<br>
> @transition-key=12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @transition-magic=0:0;12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @call-id=1276, @last-run=1457613441, @last-rc-change=1457613441,<br>
> @exec-time=66<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Completed cib_modify operation for section status: OK<br>
> (rc=0, origin=mail1/crmd/197, version=0.197.19)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1<br>
> (rc=0)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: --- 0.197.19 2<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: +++ 0.197.20 (null)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: + /cib: @num_updates=20<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: +<br>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']:<br>
> @operation_key=postfix_stop_0, @operation=stop,<br>
> @transition-key=5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @transition-magic=0:0;5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @call-id=1278, @last-run=1457613441, @last-rc-change=1457613441,<br>
> @exec-time=476<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> te_rsc_command: Initiating action 79: stop fs-mail_stop_0 on mail1<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Completed cib_modify operation for section status: OK<br>
> (rc=0, origin=mail1/crmd/198, version=0.197.20)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: --- 0.197.20 2<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: +++ 0.197.21 (null)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: + /cib: @num_updates=21<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: +<br>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-mail']/lrm_rsc_op[@id='fs-mail_last_0']:<br>
> @operation_key=fs-mail_stop_0, @operation=stop,<br>
> @transition-key=79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @transition-magic=0:0;79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @call-id=1280, @last-run=1457613441, @last-rc-change=1457613441,<br>
> @exec-time=88, @queue-time=1<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Completed cib_modify operation for section status: OK<br>
> (rc=0, origin=mail1/crmd/199, version=0.197.21)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> match_graph_event: Action fs-mail_stop_0 (79) confirmed on mail1 (rc=0)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> te_rsc_command: Initiating action 77: stop fs-spool_stop_0 on mail1<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: --- 0.197.21 2<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: Diff: +++ 0.197.22 (null)<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: + /cib: @num_updates=22<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_perform_op: +<br>
> /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-spool']/lrm_rsc_op[@id='fs-spool_last_0']:<br>
> @operation_key=fs-spool_stop_0, @operation=stop,<br>
> @transition-key=77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @transition-magic=0:0;77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a,<br>
> @call-id=1282, @last-run=1457613441, @last-rc-change=1457613441,<br>
> @exec-time=86<br>
> Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_request: Completed cib_modify operation for section status: OK<br>
> (rc=0, origin=mail1/crmd/200, version=0.197.22)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info:<br>
> match_graph_event: Action fs-spool_stop_0 (77) confirmed on mail1 (rc=0)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:<br>
> run_graph: Transition 1236 (Complete=11, Pending=0, Fired=0, Skipped=0,<br>
> Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-303.bz2):<br>
> Terminated<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning:<br>
> te_graph_trigger: Transition failed: terminated<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_graph: Graph 1236 with 12 actions: batch-limit=12 jobs,<br>
> network-delay=0ms<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 16]: Completed pseudo op<br>
> network-services_stopped_0 on N/A (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 15]: Completed pseudo op<br>
> network-services_stop_0 on N/A (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 12]: Completed rsc op virtualip-1_stop_0<br>
> on mail1 (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 84]: Completed pseudo op<br>
> fs-services_stopped_0 on N/A (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 83]: Completed pseudo op fs-services_stop_0<br>
> on N/A (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 77]: Completed rsc op fs-spool_stop_0<br>
> on mail1 (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 79]: Completed rsc op fs-mail_stop_0<br>
> on mail1 (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 90]: Completed pseudo op<br>
> mail-services_stopped_0 on N/A (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 89]: Completed pseudo op<br>
> mail-services_stop_0 on N/A (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 86]: Pending rsc op postfix_monitor_45000<br>
> on mail2 (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: * [Input 85]: Unresolved dependency rsc op<br>
> postfix_start_0 on mail2<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 5]: Completed rsc op postfix_stop_0<br>
> on mail1 (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> print_synapse: [Action 8]: Completed pseudo op all_stopped<br>
> on N/A (priority: 0, waiting: none)<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: do_log:<br>
> FSA: Input I_TE_SUCCESS from notify_crmd() received in state<br>
> S_TRANSITION_ENGINE<br>
> Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice:<br>
> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [<br>
> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br>
> Mar 10 13:37:26 [7415] HWJ-626.domain.local cib: info:<br>
> cib_process_ping: Reporting our current digest to mail2:<br>
> 3896ee29cdb6ba128330b0ef6e41bd79 for 0.197.22 (0x1544a30 0)<br>
<br>
_______________________________________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>
<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br></div></div>