[ClusterLabs] A processor failed, forming new configuration very often and without reason
Philippe Carbonnier
philippe.carbonnier at vif.fr
Tue Apr 14 09:05:24 UTC 2015
Hello,
Even if the agent is "fixed", TOTEM still can't see the other node, just
for 1 or 2 seconds
While every thing is fine at Apr 13 22:03:57,
at Apr 13 22:04:19 corosync [TOTEM ] A processor failed, forming new
configuration.
and 1 seconde after the node is now here
Apr 13 22:04:20 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 13 22:04:20 host2.example.com crmd: [16960]: notice: ais_dispatch:
Membership 385380: quorum acquired
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: ais_status_callback:
status: vif5_7 is now member (was lost)
...
every thing is fine 1 second after
Apr 13 22:19:21 host2.example.com pengine: [16959]: info:
determine_online_status: Node vif5_7 is online
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: group_print:
Resource Group: vifGroup
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2): Started vif5_7
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: native_print:
routing-jboss (lsb:routing-jboss): Started vif5_7
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: LogActions:
Leave resource clusterIP (Started vif5_7)
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: LogActions:
Leave resource routing-jboss (Started vif5_7)
...
and one more time vif5_7 node is missing after 15 minutes:
Apr 13 22:19:46 host2.example.com cib: [16956]: info: cib_stats: Processed
1 operations (0.00us average, 0% utilization) in the last 10min
Apr 13 22:34:13 corosync [TOTEM ] A processor failed, forming new
configuration.
and 2 seconds after it comes back :
Apr 13 22:34:15 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
I still don't understand why the vif5_7 node is lost from TOTEM point of
view while vif5_7 is really up and running.
End admin say that he has no network error, and there are no messages in
/var/log/message about network error.
Can I set more debug messages, or raise a timeout ?
Best regards,
Here is the full log:
Apr 13 22:03:57 host2.example.com pengine: [16959]: notice: unpack_config:
On loss of CCM Quorum: Ignore
Apr 13 22:03:57 host2.example.com pengine: [16959]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Apr 13 22:03:57 host2.example.com pengine: [16959]: info:
determine_online_status: Node host2.example.com is online
Apr 13 22:03:57 host2.example.com pengine: [16959]: info:
determine_online_status: Node vif5_7 is online
Apr 13 22:03:57 host2.example.com pengine: [16959]: notice: group_print:
Resource Group: vifGroup
Apr 13 22:03:57 host2.example.com pengine: [16959]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2): Started vif5_7
Apr 13 22:03:57 host2.example.com pengine: [16959]: notice: native_print:
routing-jboss (lsb:routing-jboss): Started vif5_7
Apr 13 22:03:57 host2.example.com pengine: [16959]: notice: LogActions:
Leave resource clusterIP (Started vif5_7)
Apr 13 22:03:57 host2.example.com pengine: [16959]: notice: LogActions:
Leave resource routing-jboss (Started vif5_7)
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: unpack_graph:
Unpacked transition 35: 0 actions in 0 synapses
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: do_te_invoke:
Processing graph 35 (ref=pe_calc-dc-1428955437-66) derived from
/var/lib/pengine/pe-input-87990.bz2
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: run_graph:
====================================================
Apr 13 22:03:57 host2.example.com crmd: [16960]: notice: run_graph:
Transition 35 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-87990.bz2): Complete
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: te_graph_trigger:
Transition 35 is now complete
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: notify_crmd:
Transition 35 status: done - <null>
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 13 22:03:57 host2.example.com crmd: [16960]: info: do_state_transition:
Starting PEngine Recheck Timer
Apr 13 22:03:57 host2.example.com pengine: [16959]: info:
process_pe_message: Transition 35: PEngine Input stored in:
/var/lib/pengine/pe-input-87990.bz2
Apr 13 22:04:19 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 13 22:04:20 host2.example.com crmd: [16960]: notice: ais_dispatch:
Membership 385376: quorum lost
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: ais_status_callback:
status: vif5_7 is now lost (was member)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: crm_update_peer:
Node vif5_7: id=38275594 state=lost (new) addr=r(0) ip(10.10.72.2) votes=1
born=385372 seen=385372 proc=00000000000000000000000000013312
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
erase_node_from_join: Removed node vif5_7 from join calculations:
welcomed=0 itegrated=0 finalized=0 confirmed=1
Apr 13 22:04:20 host2.example.com cib: [16956]: notice: ais_dispatch:
Membership 385376: quorum lost
Apr 13 22:04:20 host2.example.com cib: [16956]: info: crm_update_peer: Node
vif5_7: id=38275594 state=lost (new) addr=r(0) ip(10.10.72.2) votes=1
born=385372 seen=385372 proc=00000000000000000000000000013312
Apr 13 22:04:20 corosync [pcmk ] notice: pcmk_peer_update: Transitional
membership event on ring 385376: memb=1, new=0, lost=1
Apr 13 22:04:20 corosync [pcmk ] info: pcmk_peer_update: memb:
host2.example.com 122161674
Apr 13 22:04:20 corosync [pcmk ] info: pcmk_peer_update: lost: vif5_7
38275594
Apr 13 22:04:20 corosync [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 385376: memb=1, new=0, lost=0
Apr 13 22:04:20 corosync [pcmk ] info: pcmk_peer_update: MEMB:
host2.example.com 122161674
Apr 13 22:04:20 corosync [pcmk ] info: ais_mark_unseen_peer_dead: Node
vif5_7 was not seen in the previous transition
Apr 13 22:04:20 corosync [pcmk ] info: update_member: Node 38275594/vif5_7
is now: lost
Apr 13 22:04:20 corosync [pcmk ] info: send_member_notification: Sending
membership update 385376 to 2 children
Apr 13 22:04:20 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 13 22:04:20 corosync [MAIN ] Completed service synchronization, ready
to provide service.
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: crm_update_quorum:
Updating quorum status to false (call=101)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/99,
version=0.2903.14): ok (rc=0)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: - <cib have-quorum="1" admin_epoch="0" epoch="2903"
num_updates="15" />
Apr 13 22:04:20 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: + <cib have-quorum="0" admin_epoch="0" epoch="2904"
num_updates="1" />
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/101,
version=0.2904.1): ok (rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:04:20 host2.example.com crmd: [16960]: WARN: match_down_event: No
match for shutdown action on vif5_7
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_update_diff:
Stonith/shutdown of vif5_7 not matched
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
abort_transition_graph: te_update_diff:191 - Triggered transition abort
(complete=1, tag=node_state, id=vif5_7, magic=NA, cib=0.2903.15) : Node
failure
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
abort_transition_graph: need_abort:59 - Triggered transition abort
(complete=1) : Non-status change
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: need_abort: Aborting
on change to have-quorum
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
All 1 cluster nodes are eligible to run resources.
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
104: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
105: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:04:20 host2.example.com cib: [19809]: info: write_cib_contents:
Archived previous version as /var/lib/heartbeat/crm/cib-78.raw
Apr 13 22:04:20 host2.example.com cib: [19809]: info: write_cib_contents:
Wrote version 0.2904.0 of the CIB to disk (digest:
8f2fc22432de55e6e307cb774955b25f)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/103, version=0.2904.1): ok (rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
do_pe_invoke_callback: Invoking the PE: query=105,
ref=pe_calc-dc-1428955460-68, seq=385376, quorate=0
Apr 13 22:04:20 host2.example.com cib: [19809]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.pmiSWo (digest:
/var/lib/heartbeat/crm/cib.usmSte)
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: unpack_config:
On loss of CCM Quorum: Ignore
Apr 13 22:04:20 host2.example.com pengine: [16959]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Apr 13 22:04:20 host2.example.com pengine: [16959]: info:
determine_online_status: Node host2.example.com is online
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: group_print:
Resource Group: vifGroup
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2): Stopped
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: native_print:
routing-jboss (lsb:routing-jboss): Stopped
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for clusterIP on host2.example.com
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for routing-jboss on host2.example.com
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: LogActions:
Start clusterIP (host2.example.com)
Apr 13 22:04:20 host2.example.com pengine: [16959]: notice: LogActions:
Start routing-jboss (host2.example.com)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: unpack_graph:
Unpacked transition 36: 6 actions in 6 synapses
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_te_invoke:
Processing graph 36 (ref=pe_calc-dc-1428955460-68) derived from
/var/lib/pengine/pe-input-87991.bz2
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 8 fired and confirmed
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 4: start clusterIP_start_0 on host2.example.com (local)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=4:36:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=clusterIP_start_0 )
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: rsc:clusterIP:10:
start
Apr 13 22:04:20 host2.example.com pengine: [16959]: info:
process_pe_message: Transition 36: PEngine Input stored in:
/var/lib/pengine/pe-input-87991.bz2
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) logger: unknown facility name: none.
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) logger: unknown facility name: none.
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) logger: unknown facility name: none.
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_start_0 (call=10, rc=0, cib-update=106,
confirmed=true) ok
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_start_0 (4) confirmed on host2.example.com (rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 5: monitor clusterIP_monitor_30000 on host2.example.com
(local)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=5:36:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=clusterIP_monitor_30000 )
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: rsc:clusterIP:11:
monitor
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 6: start routing-jboss_start_0 on host2.example.com
(local)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=6:36:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=routing-jboss_start_0 )
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info:
rsc:routing-jboss:12: start
Apr 13 22:04:20 host2.example.com lrmd: [19863]: WARN: For LSB init script,
no additional parameters are needed.
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout) Redirecting traffic from 10.10.72.3 to
10.10.72.7
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout) [
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout) OK
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout) ]
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout)
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_start_0 (call=12, rc=0, cib-update=107,
confirmed=true) ok
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_start_0 (6) confirmed on host2.example.com (rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 9 fired and confirmed
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 7: monitor routing-jboss_monitor_30000 on
host2.example.com (local)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=7:36:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=routing-jboss_monitor_30000 )
Apr 13 22:04:20 host2.example.com lrmd: [16957]: info:
rsc:routing-jboss:13: monitor
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_monitor_30000 (call=13, rc=0, cib-update=108,
confirmed=false) ok
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_monitor_30000 (7) confirmed on host2.example.com (rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_monitor_30000 (call=11, rc=0, cib-update=109,
confirmed=false) ok
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_monitor_30000 (5) confirmed on host2.example.com (rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: run_graph:
====================================================
Apr 13 22:04:20 host2.example.com crmd: [16960]: notice: run_graph:
Transition 36 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-87991.bz2): Complete
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: te_graph_trigger:
Transition 36 is now complete
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: notify_crmd:
Transition 36 status: done - <null>
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
Starting PEngine Recheck Timer
Apr 13 22:04:20 corosync [pcmk ] notice: pcmk_peer_update: Transitional
membership event on ring 385380: memb=1, new=0, lost=0
Apr 13 22:04:20 corosync [pcmk ] info: pcmk_peer_update: memb:
host2.example.com 122161674
Apr 13 22:04:20 corosync [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 385380: memb=2, new=1, lost=0
Apr 13 22:04:20 corosync [pcmk ] info: update_member: Node 38275594/vif5_7
is now: member
Apr 13 22:04:20 corosync [pcmk ] info: pcmk_peer_update: NEW: vif5_7
38275594
Apr 13 22:04:20 corosync [pcmk ] info: pcmk_peer_update: MEMB: vif5_7
38275594
Apr 13 22:04:20 corosync [pcmk ] info: pcmk_peer_update: MEMB:
host2.example.com 122161674
Apr 13 22:04:20 corosync [pcmk ] info: send_member_notification: Sending
membership update 385380 to 2 children
Apr 13 22:04:20 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 13 22:04:20 host2.example.com crmd: [16960]: notice: ais_dispatch:
Membership 385380: quorum acquired
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: ais_status_callback:
status: vif5_7 is now member (was lost)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: crm_update_peer:
Node vif5_7: id=38275594 state=member (new) addr=r(0) ip(10.10.72.2)
votes=1 born=385372 seen=385380 proc=00000000000000000000000000013312
Apr 13 22:04:20 host2.example.com cib: [16956]: notice: ais_dispatch:
Membership 385380: quorum acquired
Apr 13 22:04:20 host2.example.com cib: [16956]: info: crm_update_peer: Node
vif5_7: id=38275594 state=member (new) addr=r(0) ip(10.10.72.2) votes=1
born=385372 seen=385380 proc=00000000000000000000000000013312
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: crm_update_quorum:
Updating quorum status to true (call=114)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vif5_7']/lrm (origin=local/crmd/110,
version=0.2904.6): ok (rc=0)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vif5_7']/transient_attributes (origin=local/crmd/111,
version=0.2904.7): ok (rc=0)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/112,
version=0.2904.7): ok (rc=0)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: - <cib have-quorum="0" admin_epoch="0" epoch="2904"
num_updates="8" />
Apr 13 22:04:20 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: + <cib have-quorum="1" admin_epoch="0" epoch="2905"
num_updates="1" />
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/114,
version=0.2905.1): ok (rc=0)
Apr 13 22:04:20 corosync [MAIN ] Completed service synchronization, ready
to provide service.
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
abort_transition_graph: te_update_diff:267 - Triggered transition abort
(complete=1, tag=lrm_rsc_op, id=routing-jboss_monitor_0,
magic=0:7;7:1:7:90d438c0-2db4-4d39-bdf9-f0476d6f06ff, cib=0.2904.6) :
Resource op removal
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of "//node_state[@uname='vif5_7']/lrm": ok
(rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
abort_transition_graph: te_update_diff:157 - Triggered transition abort
(complete=1, tag=transient_attributes, id=vif5_7, magic=NA, cib=0.2904.7) :
Transient attribute: removal
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of
"//node_state[@uname='vif5_7']/transient_attributes": ok (rc=0)
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
abort_transition_graph: need_abort:59 - Triggered transition abort
(complete=1) : Non-status change
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: need_abort: Aborting
on change to have-quorum
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
Membership changed: 385372 -> 385380 - join restart
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
117: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=do_state_transition ]
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: update_dc: Unset DC
host2.example.com
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: join_make_offer:
Making join offers based on membership 385380
Apr 13 22:04:20 host2.example.com crmd: [16960]: info:
do_dc_join_offer_all: join-4: Waiting on 2 outstanding join acks
Apr 13 22:04:20 host2.example.com crmd: [16960]: info: update_dc: Set DC to
host2.example.com (3.0.1)
Apr 13 22:04:20 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/116, version=0.2905.1): ok (rc=0)
Apr 13 22:04:20 host2.example.com cib: [19911]: info: write_cib_contents:
Archived previous version as /var/lib/heartbeat/crm/cib-79.raw
Apr 13 22:04:20 host2.example.com cib: [19911]: info: write_cib_contents:
Wrote version 0.2905.0 of the CIB to disk (digest:
bc0a7b07b27af8f6e4856adc9d50c90b)
Apr 13 22:04:20 host2.example.com cib: [19911]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.N6jEVp (digest:
/var/lib/heartbeat/crm/cib.4Uaqrg)
Apr 13 22:04:21 host2.example.com crmd: [16960]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: update_dc: Unset DC
host2.example.com
Apr 13 22:04:21 host2.example.com cib: [16956]: WARN: cib_process_diff:
Diff 0.2904.1 -> 0.2904.2 not applied to 0.2905.1: current "epoch" is
greater than required
Apr 13 22:04:21 host2.example.com cib: [16956]: WARN: cib_process_diff:
Diff 0.2904.2 -> 0.2904.3 not applied to 0.2905.1: current "epoch" is
greater than required
Apr 13 22:04:21 host2.example.com cib: [16956]: WARN: cib_process_diff:
Diff 0.2904.3 -> 0.2905.1 not applied to 0.2905.1: current "epoch" is
greater than required
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
do_election_count_vote: Election 3 (owner: vif5_7) pass: vote from vif5_7
(Age)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_dc_takeover:
Taking over DC status for this partition
Apr 13 22:04:21 host2.example.com cib: [16956]: info:
cib_process_readwrite: We are now in R/O mode
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_slave_all for section 'all'
(origin=local/crmd/119, version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com cib: [16956]: info:
cib_process_readwrite: We are now in R/W mode
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_master for section 'all' (origin=local/crmd/120,
version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/121,
version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/123, version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
do_dc_join_offer_all: join-5: Waiting on 2 outstanding join acks
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: ais_dispatch:
Membership 385380: quorum retained
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/125, version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
config_query_callback: Checking for expired actions every 900000ms
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
config_query_callback: Sending expected-votes=2 to corosync
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: update_dc: Set DC to
host2.example.com (3.0.1)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: ais_dispatch:
Membership 385380: quorum retained
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/128, version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
All 2 cluster nodes responded to the join offer.
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_dc_join_finalize:
join-5: Syncing the CIB from host2.example.com to the rest of the cluster
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/131, version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_sync for section 'all' (origin=local/crmd/132,
version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/133,
version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_dc_join_ack:
join-5: Updating node state to member for host2.example.com
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/134,
version=0.2905.1): ok (rc=0)
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section //node_state[@uname='
host2.example.com']/lrm (origin=local/crmd/135, version=0.2905.2): ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of
"//node_state[@uname='host2.example.com']/lrm":
ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_dc_join_ack:
join-5: Updating node state to member for vif5_7
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vif5_7']/lrm (origin=local/crmd/137,
version=0.2905.4): ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of "//node_state[@uname='vif5_7']/lrm": ok
(rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
All 2 cluster nodes are eligible to run resources.
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_dc_join_final:
Ensuring DC, quorum and node attributes are up-to-date
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: crm_update_quorum:
Updating quorum status to true (call=141)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
abort_transition_graph: do_te_invoke:191 - Triggered transition abort
(complete=1) : Peer Cancelled
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
142: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:04:21 host2.example.com attrd: [16958]: info:
attrd_local_callback: Sending full refresh (origin=crmd)
Apr 13 22:04:21 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for:
fail-count-routing-jboss (<null>)
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/139,
version=0.2905.5): ok (rc=0)
Apr 13 22:04:21 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/141,
version=0.2905.5): ok (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info:
do_pe_invoke_callback: Invoking the PE: query=142,
ref=pe_calc-dc-1428955461-85, seq=385380, quorate=1
Apr 13 22:04:21 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for: probe_complete
(true)
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: unpack_config:
On loss of CCM Quorum: Ignore
Apr 13 22:04:21 host2.example.com pengine: [16959]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Apr 13 22:04:21 host2.example.com pengine: [16959]: info:
determine_online_status: Node host2.example.com is online
Apr 13 22:04:21 host2.example.com pengine: [16959]: info:
determine_online_status: Node vif5_7 is online
Apr 13 22:04:21 host2.example.com pengine: [16959]: ERROR:
native_add_running: Resource lsb::routing-jboss:routing-jboss appears to be
active on 2 nodes.
Apr 13 22:04:21 host2.example.com pengine: [16959]: WARN: See
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
Apr 13 22:04:21 host2.example.com pengine: [16959]: ERROR:
native_add_running: Resource ocf::IPaddr2:clusterIP appears to be active on
2 nodes.
Apr 13 22:04:21 host2.example.com pengine: [16959]: WARN: See
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: group_print:
Resource Group: vifGroup
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2) Started
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: native_print:
0 : host2.example.com
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: native_print:
1 : vif5_7
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: native_print:
routing-jboss (lsb:routing-jboss) Started
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: native_print:
0 : host2.example.com
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: native_print:
1 : vif5_7
Apr 13 22:04:21 host2.example.com pengine: [16959]: WARN:
native_create_actions: Attempting recovery of resource clusterIP
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for clusterIP on vif5_7
Apr 13 22:04:21 host2.example.com pengine: [16959]: WARN:
native_create_actions: Attempting recovery of resource routing-jboss
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for routing-jboss on vif5_7
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: LogActions:
Move resource clusterIP (Started host2.example.com -> vif5_7)
Apr 13 22:04:21 host2.example.com pengine: [16959]: notice: LogActions:
Move resource routing-jboss (Started host2.example.com -> vif5_7)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: unpack_graph:
Unpacked transition 37: 13 actions in 13 synapses
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_te_invoke:
Processing graph 37 (ref=pe_calc-dc-1428955461-85) derived from
/var/lib/pengine/pe-error-493.bz2
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 17 fired and confirmed
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 12: stop routing-jboss_stop_0 on host2.example.com (local)
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: cancel_op: operation
monitor[13] on lsb::routing-jboss::routing-jboss for client 16960, its
parameters: CRM_meta_interval=[30000] CRM_meta_timeout=[20000]
crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=12:37:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=routing-jboss_stop_0 )
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info:
rsc:routing-jboss:14: stop
Apr 13 22:04:21 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 13: stop routing-jboss_stop_0 on vif5_7
Apr 13 22:04:21 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for:
last-failure-routing-jboss (<null>)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_monitor_30000 (call=13, status=1, cib-update=0,
confirmed=true) Cancelled
Apr 13 22:04:21 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>)
Apr 13 22:04:21 host2.example.com lrmd: [19912]: WARN: For LSB init script,
no additional parameters are needed.
Apr 13 22:04:21 host2.example.com pengine: [16959]: ERROR:
process_pe_message: Transition 37: ERRORs found during PE processing.
PEngine Input stored in: /var/lib/pengine/pe-error-493.bz2
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) Disabling traffic redirection from 10.10.72.3
to 10.10.72.7
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) [
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) OK
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) ]
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout)
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_stop_0 (call=14, rc=0, cib-update=143,
confirmed=true) ok
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_stop_0 (12) confirmed on host2.example.com (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_stop_0 (13) confirmed on vif5_7 (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 9: stop clusterIP_stop_0 on host2.example.com (local)
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: cancel_op: operation
monitor[11] on ocf::IPaddr2::clusterIP for client 16960, its parameters:
CRM_meta_interval=[30000] ip=[10.10.72.3] cidr_netmask=[32]
CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor]
iflabel=[jbossfailover] cancelled
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=9:37:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=clusterIP_stop_0 )
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: rsc:clusterIP:15:
stop
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 10: stop clusterIP_stop_0 on vif5_7
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_monitor_30000 (call=11, status=1, cib-update=0,
confirmed=true) Cancelled
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:stop:stderr) logger: unknown facility name: none.
Apr 13 22:04:21 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:stop:stderr) logger: unknown facility name: none.
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_stop_0 (call=15, rc=0, cib-update=144,
confirmed=true) ok
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_stop_0 (9) confirmed on host2.example.com (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_stop_0 (10) confirmed on vif5_7 (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 18 fired and confirmed
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 5 fired and confirmed
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 15 fired and confirmed
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 11: start clusterIP_start_0 on vif5_7
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_start_0 (11) confirmed on vif5_7 (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 4: monitor clusterIP_monitor_30000 on vif5_7
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 14: start routing-jboss_start_0 on vif5_7
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_start_0 (14) confirmed on vif5_7 (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 16 fired and confirmed
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 3: monitor routing-jboss_monitor_30000 on vif5_7
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_monitor_30000 (4) confirmed on vif5_7 (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_monitor_30000 (3) confirmed on vif5_7 (rc=0)
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: run_graph:
====================================================
Apr 13 22:04:21 host2.example.com crmd: [16960]: notice: run_graph:
Transition 37 (Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-error-493.bz2): Complete
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: te_graph_trigger:
Transition 37 is now complete
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: notify_crmd:
Transition 37 status: done - <null>
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 13 22:04:21 host2.example.com crmd: [16960]: info: do_state_transition:
Starting PEngine Recheck Timer
Apr 13 22:04:24 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) ARPING 10.10.72.3 from 10.10.72.3 eth0
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
Apr 13 22:09:46 host2.example.com cib: [16956]: info: cib_stats: Processed
81 operations (2345.00us average, 0% utilization) in the last 10min
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: crm_timer_popped:
PEngine Recheck Timer (I_PE_CALC) just popped!
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_state_transition:
Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_state_transition:
All 2 cluster nodes are eligible to run resources.
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
145: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:19:21 host2.example.com crmd: [16960]: info:
do_pe_invoke_callback: Invoking the PE: query=145,
ref=pe_calc-dc-1428956361-94, seq=385380, quorate=1
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: unpack_config:
On loss of CCM Quorum: Ignore
Apr 13 22:19:21 host2.example.com pengine: [16959]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Apr 13 22:19:21 host2.example.com pengine: [16959]: info:
determine_online_status: Node host2.example.com is online
Apr 13 22:19:21 host2.example.com pengine: [16959]: info:
determine_online_status: Node vif5_7 is online
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: group_print:
Resource Group: vifGroup
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2): Started vif5_7
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: native_print:
routing-jboss (lsb:routing-jboss): Started vif5_7
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: LogActions:
Leave resource clusterIP (Started vif5_7)
Apr 13 22:19:21 host2.example.com pengine: [16959]: notice: LogActions:
Leave resource routing-jboss (Started vif5_7)
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: unpack_graph:
Unpacked transition 38: 0 actions in 0 synapses
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_te_invoke:
Processing graph 38 (ref=pe_calc-dc-1428956361-94) derived from
/var/lib/pengine/pe-input-87992.bz2
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: run_graph:
====================================================
Apr 13 22:19:21 host2.example.com crmd: [16960]: notice: run_graph:
Transition 38 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-87992.bz2): Complete
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: te_graph_trigger:
Transition 38 is now complete
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: notify_crmd:
Transition 38 status: done - <null>
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 13 22:19:21 host2.example.com crmd: [16960]: info: do_state_transition:
Starting PEngine Recheck Timer
Apr 13 22:19:21 host2.example.com pengine: [16959]: info:
process_pe_message: Transition 38: PEngine Input stored in:
/var/lib/pengine/pe-input-87992.bz2
Apr 13 22:19:46 host2.example.com cib: [16956]: info: cib_stats: Processed
1 operations (0.00us average, 0% utilization) in the last 10min
Apr 13 22:34:13 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 13 22:34:14 host2.example.com cib: [16956]: notice: ais_dispatch:
Membership 385384: quorum lost
Apr 13 22:34:14 host2.example.com cib: [16956]: info: crm_update_peer: Node
vif5_7: id=38275594 state=lost (new) addr=r(0) ip(10.10.72.2) votes=1
born=385372 seen=385380 proc=00000000000000000000000000013312
Apr 13 22:34:14 host2.example.com crmd: [16960]: notice: ais_dispatch:
Membership 385384: quorum lost
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: ais_status_callback:
status: vif5_7 is now lost (was member)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: crm_update_peer:
Node vif5_7: id=38275594 state=lost (new) addr=r(0) ip(10.10.72.2) votes=1
born=385372 seen=385380 proc=00000000000000000000000000013312
Apr 13 22:34:14 host2.example.com crmd: [16960]: info:
erase_node_from_join: Removed node vif5_7 from join calculations:
welcomed=0 itegrated=0 finalized=0 confirmed=1
Apr 13 22:34:14 corosync [pcmk ] notice: pcmk_peer_update: Transitional
membership event on ring 385384: memb=1, new=0, lost=1
Apr 13 22:34:14 corosync [pcmk ] info: pcmk_peer_update: memb:
host2.example.com 122161674
Apr 13 22:34:14 corosync [pcmk ] info: pcmk_peer_update: lost: vif5_7
38275594
Apr 13 22:34:14 corosync [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 385384: memb=1, new=0, lost=0
Apr 13 22:34:14 corosync [pcmk ] info: pcmk_peer_update: MEMB:
host2.example.com 122161674
Apr 13 22:34:14 corosync [pcmk ] info: ais_mark_unseen_peer_dead: Node
vif5_7 was not seen in the previous transition
Apr 13 22:34:14 corosync [pcmk ] info: update_member: Node 38275594/vif5_7
is now: lost
Apr 13 22:34:14 corosync [pcmk ] info: send_member_notification: Sending
membership update 385384 to 2 children
Apr 13 22:34:14 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 13 22:34:14 corosync [MAIN ] Completed service synchronization, ready
to provide service.
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: crm_update_quorum:
Updating quorum status to false (call=148)
Apr 13 22:34:14 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/146,
version=0.2905.13): ok (rc=0)
Apr 13 22:34:14 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: - <cib have-quorum="1" admin_epoch="0" epoch="2905"
num_updates="14" />
Apr 13 22:34:14 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: + <cib have-quorum="0" admin_epoch="0" epoch="2906"
num_updates="1" />
Apr 13 22:34:14 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/148,
version=0.2906.1): ok (rc=0)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:34:14 host2.example.com crmd: [16960]: WARN: match_down_event: No
match for shutdown action on vif5_7
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: te_update_diff:
Stonith/shutdown of vif5_7 not matched
Apr 13 22:34:14 host2.example.com crmd: [16960]: info:
abort_transition_graph: te_update_diff:191 - Triggered transition abort
(complete=1, tag=node_state, id=vif5_7, magic=NA, cib=0.2905.14) : Node
failure
Apr 13 22:34:14 host2.example.com crmd: [16960]: info:
abort_transition_graph: need_abort:59 - Triggered transition abort
(complete=1) : Non-status change
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: need_abort: Aborting
on change to have-quorum
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_state_transition:
All 1 cluster nodes are eligible to run resources.
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
151: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
152: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:34:14 host2.example.com cib: [20074]: info: write_cib_contents:
Archived previous version as /var/lib/heartbeat/crm/cib-80.raw
Apr 13 22:34:14 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/150, version=0.2906.1): ok (rc=0)
Apr 13 22:34:14 host2.example.com cib: [20074]: info: write_cib_contents:
Wrote version 0.2906.0 of the CIB to disk (digest:
d9d6d1c108ee4b37b17a78a966c21393)
Apr 13 22:34:14 host2.example.com cib: [20074]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.YpPE5x (digest:
/var/lib/heartbeat/crm/cib.QY9qLw)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info:
do_pe_invoke_callback: Invoking the PE: query=152,
ref=pe_calc-dc-1428957254-96, seq=385384, quorate=0
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: unpack_config:
On loss of CCM Quorum: Ignore
Apr 13 22:34:14 host2.example.com pengine: [16959]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Apr 13 22:34:14 host2.example.com pengine: [16959]: info:
determine_online_status: Node host2.example.com is online
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: group_print:
Resource Group: vifGroup
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2): Stopped
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: native_print:
routing-jboss (lsb:routing-jboss): Stopped
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for clusterIP on host2.example.com
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for routing-jboss on host2.example.com
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: LogActions:
Start clusterIP (host2.example.com)
Apr 13 22:34:14 host2.example.com pengine: [16959]: notice: LogActions:
Start routing-jboss (host2.example.com)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: unpack_graph:
Unpacked transition 39: 6 actions in 6 synapses
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_te_invoke:
Processing graph 39 (ref=pe_calc-dc-1428957254-96) derived from
/var/lib/pengine/pe-input-87993.bz2
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 8 fired and confirmed
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 4: start clusterIP_start_0 on host2.example.com (local)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=4:39:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=clusterIP_start_0 )
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: rsc:clusterIP:16:
start
Apr 13 22:34:14 host2.example.com pengine: [16959]: info:
process_pe_message: Transition 39: PEngine Input stored in:
/var/lib/pengine/pe-input-87993.bz2
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) logger: unknown facility name: none.
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) logger: unknown facility name: none.
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) logger: unknown facility name: none.
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_start_0 (call=16, rc=0, cib-update=153,
confirmed=true) ok
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_start_0 (4) confirmed on host2.example.com (rc=0)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 5: monitor clusterIP_monitor_30000 on host2.example.com
(local)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=5:39:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=clusterIP_monitor_30000 )
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: rsc:clusterIP:17:
monitor
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 6: start routing-jboss_start_0 on host2.example.com
(local)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=6:39:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=routing-jboss_start_0 )
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info:
rsc:routing-jboss:18: start
Apr 13 22:34:14 host2.example.com lrmd: [20128]: WARN: For LSB init script,
no additional parameters are needed.
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout) Redirecting traffic from 10.10.72.3 to
10.10.72.7
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout) [
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout) OK ]
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:start:stdout)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_start_0 (call=18, rc=0, cib-update=154,
confirmed=true) ok
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_start_0 (6) confirmed on host2.example.com (rc=0)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 9 fired and confirmed
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 7: monitor routing-jboss_monitor_30000 on
host2.example.com (local)
Apr 13 22:34:14 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=7:39:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=routing-jboss_monitor_30000 )
Apr 13 22:34:14 host2.example.com lrmd: [16957]: info:
rsc:routing-jboss:19: monitor
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_monitor_30000 (call=17, rc=0, cib-update=155,
confirmed=false) ok
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_monitor_30000 (call=19, rc=0, cib-update=156,
confirmed=false) ok
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_monitor_30000 (5) confirmed on host2.example.com (rc=0)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_monitor_30000 (7) confirmed on host2.example.com (rc=0)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: run_graph:
====================================================
Apr 13 22:34:15 host2.example.com crmd: [16960]: notice: run_graph:
Transition 39 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-87993.bz2): Complete
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: te_graph_trigger:
Transition 39 is now complete
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: notify_crmd:
Transition 39 status: done - <null>
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_state_transition:
Starting PEngine Recheck Timer
Apr 13 22:34:15 corosync [pcmk ] notice: pcmk_peer_update: Transitional
membership event on ring 385388: memb=1, new=0, lost=0
Apr 13 22:34:15 host2.example.com cib: [16956]: notice: ais_dispatch:
Membership 385388: quorum acquired
Apr 13 22:34:15 host2.example.com cib: [16956]: info: crm_update_peer: Node
vif5_7: id=38275594 state=member (new) addr=r(0) ip(10.10.72.2) votes=1
born=385372 seen=385388 proc=00000000000000000000000000013312
Apr 13 22:34:15 host2.example.com crmd: [16960]: notice: ais_dispatch:
Membership 385388: quorum acquired
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: ais_status_callback:
status: vif5_7 is now member (was lost)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: crm_update_peer:
Node vif5_7: id=38275594 state=member (new) addr=r(0) ip(10.10.72.2)
votes=1 born=385372 seen=385388 proc=00000000000000000000000000013312
Apr 13 22:34:15 corosync [pcmk ] info: pcmk_peer_update: memb:
host2.example.com 122161674
Apr 13 22:34:15 corosync [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 385388: memb=2, new=1, lost=0
Apr 13 22:34:15 corosync [pcmk ] info: update_member: Node 38275594/vif5_7
is now: member
Apr 13 22:34:15 corosync [pcmk ] info: pcmk_peer_update: NEW: vif5_7
38275594
Apr 13 22:34:15 corosync [pcmk ] info: pcmk_peer_update: MEMB: vif5_7
38275594
Apr 13 22:34:15 corosync [pcmk ] info: pcmk_peer_update: MEMB:
host2.example.com 122161674
Apr 13 22:34:15 corosync [pcmk ] info: send_member_notification: Sending
membership update 385388 to 2 children
Apr 13 22:34:15 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: crm_update_quorum:
Updating quorum status to true (call=161)
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vif5_7']/lrm (origin=local/crmd/157,
version=0.2906.6): ok (rc=0)
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vif5_7']/transient_attributes (origin=local/crmd/158,
version=0.2906.7): ok (rc=0)
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/159,
version=0.2906.7): ok (rc=0)
Apr 13 22:34:15 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: - <cib have-quorum="0" admin_epoch="0" epoch="2906"
num_updates="8" />
Apr 13 22:34:15 host2.example.com cib: [16956]: info: log_data_element:
cib:diff: + <cib have-quorum="1" admin_epoch="0" epoch="2907"
num_updates="1" />
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/161,
version=0.2907.1): ok (rc=0)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:34:15 host2.example.com crmd: [16960]: info:
abort_transition_graph: te_update_diff:267 - Triggered transition abort
(complete=1, tag=lrm_rsc_op, id=routing-jboss_monitor_0,
magic=0:7;7:1:7:90d438c0-2db4-4d39-bdf9-f0476d6f06ff, cib=0.2906.6) :
Resource op removal
Apr 13 22:34:15 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of "//node_state[@uname='vif5_7']/lrm": ok
(rc=0)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info:
abort_transition_graph: te_update_diff:157 - Triggered transition abort
(complete=1, tag=transient_attributes, id=vif5_7, magic=NA, cib=0.2906.7) :
Transient attribute: removal
Apr 13 22:34:15 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of
"//node_state[@uname='vif5_7']/transient_attributes": ok (rc=0)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info:
abort_transition_graph: need_abort:59 - Triggered transition abort
(complete=1) : Non-status change
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: need_abort: Aborting
on change to have-quorum
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_state_transition:
Membership changed: 385380 -> 385388 - join restart
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
164: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=do_state_transition ]
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: update_dc: Unset DC
host2.example.com
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: join_make_offer:
Making join offers based on membership 385388
Apr 13 22:34:15 host2.example.com crmd: [16960]: info:
do_dc_join_offer_all: join-6: Waiting on 2 outstanding join acks
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: update_dc: Set DC to
host2.example.com (3.0.1)
Apr 13 22:34:15 corosync [MAIN ] Completed service synchronization, ready
to provide service.
Apr 13 22:34:15 host2.example.com cib: [20176]: info: write_cib_contents:
Archived previous version as /var/lib/heartbeat/crm/cib-81.raw
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/163, version=0.2907.1): ok (rc=0)
Apr 13 22:34:15 host2.example.com cib: [20176]: info: write_cib_contents:
Wrote version 0.2907.0 of the CIB to disk (digest:
99e405822599c0dd376272e1ad39cc0f)
Apr 13 22:34:15 host2.example.com cib: [20176]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.6Fdq6y (digest:
/var/lib/heartbeat/crm/cib.WHWXMy)
Apr 13 22:34:15 host2.example.com crmd: [16960]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: update_dc: Unset DC
host2.example.com
Apr 13 22:34:15 host2.example.com cib: [16956]: WARN: cib_process_diff:
Diff 0.2906.1 -> 0.2906.2 not applied to 0.2907.1: current "epoch" is
greater than required
Apr 13 22:34:15 host2.example.com cib: [16956]: WARN: cib_process_diff:
Diff 0.2906.2 -> 0.2906.3 not applied to 0.2907.1: current "epoch" is
greater than required
Apr 13 22:34:15 host2.example.com cib: [16956]: WARN: cib_process_diff:
Diff 0.2906.3 -> 0.2907.1 not applied to 0.2907.1: current "epoch" is
greater than required
Apr 13 22:34:15 host2.example.com crmd: [16960]: info:
do_election_count_vote: Election 5 (owner: vif5_7) pass: vote from vif5_7
(Age)
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Apr 13 22:34:15 host2.example.com crmd: [16960]: info: do_dc_takeover:
Taking over DC status for this partition
Apr 13 22:34:15 host2.example.com cib: [16956]: info:
cib_process_readwrite: We are now in R/O mode
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_slave_all for section 'all'
(origin=local/crmd/166, version=0.2907.1): ok (rc=0)
Apr 13 22:34:15 host2.example.com cib: [16956]: info:
cib_process_readwrite: We are now in R/W mode
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_master for section 'all' (origin=local/crmd/167,
version=0.2907.1): ok (rc=0)
Apr 13 22:34:15 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/168,
version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/170, version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
do_dc_join_offer_all: join-7: Waiting on 2 outstanding join acks
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: ais_dispatch:
Membership 385388: quorum retained
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/172, version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
config_query_callback: Checking for expired actions every 900000ms
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
config_query_callback: Sending expected-votes=2 to corosync
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: update_dc: Set DC to
host2.example.com (3.0.1)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: ais_dispatch:
Membership 385388: quorum retained
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/175, version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: crm_ais_dispatch:
Setting expected votes to 2
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_state_transition:
All 2 cluster nodes responded to the join offer.
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_dc_join_finalize:
join-7: Syncing the CIB from host2.example.com to the rest of the cluster
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/178, version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_sync for section 'all' (origin=local/crmd/179,
version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/180,
version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_dc_join_ack:
join-7: Updating node state to member for host2.example.com
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/181,
version=0.2907.1): ok (rc=0)
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section //node_state[@uname='
host2.example.com']/lrm (origin=local/crmd/182, version=0.2907.2): ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of
"//node_state[@uname='host2.example.com']/lrm":
ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_dc_join_ack:
join-7: Updating node state to member for vif5_7
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='vif5_7']/lrm (origin=local/crmd/184,
version=0.2907.3): ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
erase_xpath_callback: Deletion of "//node_state[@uname='vif5_7']/lrm": ok
(rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_state_transition:
All 2 cluster nodes are eligible to run resources.
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_dc_join_final:
Ensuring DC, quorum and node attributes are up-to-date
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: crm_update_quorum:
Updating quorum status to true (call=188)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
abort_transition_graph: do_te_invoke:191 - Triggered transition abort
(complete=1) : Peer Cancelled
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
189: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section nodes (origin=local/crmd/186,
version=0.2907.4): ok (rc=0)
Apr 13 22:34:16 host2.example.com cib: [16956]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/188,
version=0.2907.4): ok (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
abort_transition_graph: te_update_diff:146 - Triggered transition abort
(complete=1, tag=transient_attributes, id=vif5_7, magic=NA, cib=0.2907.5) :
Transient attribute: update
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_pe_invoke: Query
190: Requesting the current CIB: S_POLICY_ENGINE
Apr 13 22:34:16 host2.example.com crmd: [16960]: info:
do_pe_invoke_callback: Invoking the PE: query=190,
ref=pe_calc-dc-1428957256-113, seq=385388, quorate=1
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: unpack_config:
On loss of CCM Quorum: Ignore
Apr 13 22:34:16 host2.example.com pengine: [16959]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Apr 13 22:34:16 host2.example.com pengine: [16959]: info:
determine_online_status: Node host2.example.com is online
Apr 13 22:34:16 host2.example.com pengine: [16959]: info:
determine_online_status: Node vif5_7 is online
Apr 13 22:34:16 host2.example.com pengine: [16959]: ERROR:
native_add_running: Resource lsb::routing-jboss:routing-jboss appears to be
active on 2 nodes.
Apr 13 22:34:16 host2.example.com pengine: [16959]: WARN: See
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
Apr 13 22:34:16 host2.example.com pengine: [16959]: ERROR:
native_add_running: Resource ocf::IPaddr2:clusterIP appears to be active on
2 nodes.
Apr 13 22:34:16 host2.example.com pengine: [16959]: WARN: See
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: group_print:
Resource Group: vifGroup
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: native_print:
clusterIP (ocf::heartbeat:IPaddr2) Started
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: native_print:
0 : host2.example.com
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: native_print:
1 : vif5_7
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: native_print:
routing-jboss (lsb:routing-jboss) Started
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: native_print:
0 : host2.example.com
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: native_print:
1 : vif5_7
Apr 13 22:34:16 host2.example.com pengine: [16959]: WARN:
native_create_actions: Attempting recovery of resource clusterIP
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for clusterIP on vif5_7
Apr 13 22:34:16 host2.example.com pengine: [16959]: WARN:
native_create_actions: Attempting recovery of resource routing-jboss
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: RecurringOp:
Start recurring monitor (30s) for routing-jboss on vif5_7
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: LogActions:
Move resource clusterIP (Started host2.example.com -> vif5_7)
Apr 13 22:34:16 host2.example.com pengine: [16959]: notice: LogActions:
Move resource routing-jboss (Started host2.example.com -> vif5_7)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: unpack_graph:
Unpacked transition 40: 13 actions in 13 synapses
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_te_invoke:
Processing graph 40 (ref=pe_calc-dc-1428957256-113) derived from
/var/lib/pengine/pe-error-494.bz2
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 17 fired and confirmed
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 12: stop routing-jboss_stop_0 on host2.example.com (local)
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: cancel_op: operation
monitor[19] on lsb::routing-jboss::routing-jboss for client 16960, its
parameters: CRM_meta_interval=[30000] CRM_meta_timeout=[20000]
crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=12:40:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=routing-jboss_stop_0 )
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info:
rsc:routing-jboss:20: stop
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 13: stop routing-jboss_stop_0 on vif5_7
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_monitor_30000 (call=19, status=1, cib-update=0,
confirmed=true) Cancelled
Apr 13 22:34:16 host2.example.com lrmd: [20177]: WARN: For LSB init script,
no additional parameters are needed.
Apr 13 22:34:16 host2.example.com pengine: [16959]: ERROR:
process_pe_message: Transition 40: ERRORs found during PE processing.
PEngine Input stored in: /var/lib/pengine/pe-error-494.bz2
Apr 13 22:34:16 host2.example.com attrd: [16958]: info:
attrd_local_callback: Sending full refresh (origin=crmd)
Apr 13 22:34:16 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for:
fail-count-routing-jboss (<null>)
Apr 13 22:34:16 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for: probe_complete
(true)
Apr 13 22:34:16 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>)
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) Disabling traffic redirection from 10.10.72.3
to 10.10.72.7
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) [
Apr 13 22:34:16 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for:
last-failure-routing-jboss (<null>)
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) OK
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout) ]
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout)
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(routing-jboss:stop:stdout)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation routing-jboss_stop_0 (call=20, rc=0, cib-update=191,
confirmed=true) ok
Apr 13 22:34:16 host2.example.com attrd: [16958]: info:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_stop_0 (12) confirmed on host2.example.com (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_stop_0 (13) confirmed on vif5_7 (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 9: stop clusterIP_stop_0 on host2.example.com (local)
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: cancel_op: operation
monitor[17] on ocf::IPaddr2::clusterIP for client 16960, its parameters:
CRM_meta_interval=[30000] ip=[10.10.72.3] cidr_netmask=[32]
CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor]
iflabel=[jbossfailover] cancelled
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_lrm_rsc_op:
Performing key=9:40:0:90d438c0-2db4-4d39-bdf9-f0476d6f06ff
op=clusterIP_stop_0 )
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: rsc:clusterIP:21:
stop
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 10: stop clusterIP_stop_0 on vif5_7
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_monitor_30000 (call=17, status=1, cib-update=0,
confirmed=true) Cancelled
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:stop:stderr) logger: unknown facility name: none.
Apr 13 22:34:16 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:stop:stderr) logger: unknown facility name: none.
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: process_lrm_event:
LRM operation clusterIP_stop_0 (call=21, rc=0, cib-update=192,
confirmed=true) ok
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_stop_0 (9) confirmed on host2.example.com (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_stop_0 (10) confirmed on vif5_7 (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 18 fired and confirmed
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 5 fired and confirmed
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 15 fired and confirmed
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 11: start clusterIP_start_0 on vif5_7
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_start_0 (11) confirmed on vif5_7 (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 4: monitor clusterIP_monitor_30000 on vif5_7
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 14: start routing-jboss_start_0 on vif5_7
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_start_0 (14) confirmed on vif5_7 (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_pseudo_action:
Pseudo action 16 fired and confirmed
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_rsc_command:
Initiating action 3: monitor routing-jboss_monitor_30000 on vif5_7
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action routing-jboss_monitor_30000 (3) confirmed on vif5_7 (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: match_graph_event:
Action clusterIP_monitor_30000 (4) confirmed on vif5_7 (rc=0)
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: run_graph:
====================================================
Apr 13 22:34:16 host2.example.com crmd: [16960]: notice: run_graph:
Transition 40 (Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-error-494.bz2): Complete
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: te_graph_trigger:
Transition 40 is now complete
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: notify_crmd:
Transition 40 status: done - <null>
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 13 22:34:16 host2.example.com crmd: [16960]: info: do_state_transition:
Starting PEngine Recheck Timer
Apr 13 22:34:18 host2.example.com lrmd: [16957]: info: RA output:
(clusterIP:start:stderr) ARPING 10.10.72.3 from 10.10.72.3 eth0
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
Apr 13 22:39:46 host2.example.com cib: [16956]: info: cib_stats: Processed
81 operations (2345.00us average, 0% utilization) in the last 10min
Apr 13 22:46:14 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 13 22:46:15 host2.example.com cib: [16956]: notice: ais_dispatch:
Membership 385392: quorum lost
Apr 13 22:46:15 host2.example.com cib: [16956]: info: crm_update_peer: Node
vif5_7: id=38275594 state=lost (new) addr=r(0) ip(10.10.72.2) votes=1
born=385372 seen=385388 proc=00000000000000000000000000013312
Apr 13 22:46:15 corosync [pcmk ] notice: pcmk_peer_update: Transitional
membership event on ring 385392: memb=1, new=0, lost=1
Apr 13 22:46:15 corosync [pcmk ] info: pcmk_peer_update: memb:
host2.example.com 122161674
Apr 13 22:46:15 corosync [pcmk ] info: pcmk_peer_update: lost: vif5_7
38275594
Apr 13 22:46:15 corosync [pcmk ] notice: pcmk_peer_update: Stable
membership event on ring 385392: memb=1, new=0, lost=0
--
[image: logoVif] <http://www.vif.fr/>L'informatique 100% Agrowww.vif.fr
[image: VifYouTube] <http://www.youtube.com/user/Agrovif>[image: VifTwitter]
<https://twitter.com/VIF_agro>*Suivez l'actualité VIF sur:*
<http://www.vif.fr/regilait-devoile-les-dessous-du-processus-sop-et-vif-lance-vif-sop-au-cfia-de-rennes/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150414/d5883871/attachment.htm>
More information about the Users
mailing list