[Pacemaker] crmd - set dc unset dc loop

Andreas Kurz andreas at hastexo.com
Thu Jan 19 05:20:15 EST 2012


Hello,

On 01/17/2012 04:41 PM, Philippe Carbonnier wrote:
> Hello,
> 
> The configuration:
> redhat 5.5 64bits
> pacemaker-libs-1.0.10-1.4.el5.x86_64
> pacemaker-1.0.10-1.4.el5.x86_64
> corosync-1.2.7-1.1.el5.x86_64
> corosynclib-1.2.7-1.1.el5.x86_64
> 
> when working : [root at ujboss1 cluster]# crm_mon -1
> ============
> Last updated: Tue Jan 17 16:27:33 2012
> Stack: openais
> Current DC: ujboss2 - partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> ============
> 
> Online: [ ujboss1 ujboss2 ]
> 
>  Resource Group: vifGroup
>      clusterIP  (ocf::heartbeat:IPaddr2):       Started ujboss1
>      routing-jboss      (lsb:routing-jboss):    Started ujboss1
> 
> 
> Now, the problem : Just after running crm_mode offline on ujboss1
> (12:51:44), crmd seems to loop with always the same messages :
> I have restarted corosync on both node, and now it's working.
> But can you help me avoiding this "loop".
> on ujboss2:
> Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to
> ujboss1 (3.0.1)
> Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
> Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to
> ujboss1 (3.0.1)
> Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
> loop...

and these are the only messages on ujboss2 ... nothing before? looks
like corosync communication is interrupted right after putting ujboss2
into standby.

Can you please share your corosync configuration and describe your
network setup, especially the connection you use for corosync communication.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> and on ujboss1:
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked
> transition 8776: 0 actions in 0 synapses
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_te_invoke: Processing
> graph 8776 (ref=pe_calc-dc-1326800326-8977) derived from
> /var/lib/pengine/pe-input-7829.bz2
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: run_graph:
> ====================================================
> Jan 17 12:38:46 ujboss1 crmd: [28369]: notice: run_graph: Transition
> 8776 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-7829.bz2): Complete
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: te_graph_trigger:
> Transition 8776 is now complete
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: notify_crmd: Transition
> 8776 status: done - <null>
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition:
> Starting PEngine Recheck Timer
> Jan 17 12:38:46 ujboss1 pengine: [28368]: info: process_pe_message:
> Transition 8776: PEngine Input stored in:
> /var/lib/pengine/pe-input-7829.bz2
> Jan 17 12:46:27 ujboss1 cib: [28365]: info: cib_stats: Processed 1
> operations (0.00us average, 0% utilization) in the last 10min
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - <cib admin_epoch="0" epoch="233" num_updates="5" >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - <configuration >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - <nodes >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - <node id="ujboss1" >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - <instance_attributes id="nodes-ujboss1" >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - <nvpair value="off" id="nodes-ujboss1-standby" />
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - </instance_attributes>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - </node>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - </nodes>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - </configuration>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> - </cib>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + <cib admin_epoch="0" epoch="234" num_updates="1" >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + <configuration >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + <nodes >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + <node id="ujboss1" >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + <instance_attributes id="nodes-ujboss1" >
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + <nvpair value="on" id="nodes-ujboss1-standby" />
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + </instance_attributes>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + </node>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + </nodes>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + </configuration>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff:
> + </cib>
> Jan 17 12:51:44 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_modify for section nodes
> (origin=local/crm_attribute/4, version=0.234.1): ok (rc=0)
> Jan 17 12:51:44 ujboss1 crmd: [28369]: info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=1) : Non-status change
> Jan 17 12:51:44 ujboss1 crmd: [28369]: info: need_abort: Aborting on
> change to admin_epoch
> Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: All 2
> cluster nodes are eligible to run resources.
> Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_pe_invoke: Query 8981:
> Requesting the current CIB: S_POLICY_ENGINE
> Jan 17 12:51:46 ujboss1 cib: [28365]: ERROR: send_ais_text: Sending
> message 251: FAILED (rc=2): Library error: Connection timed out (110)
> Jan 17 12:51:46 ujboss1 crmd: [28369]: info: do_pe_invoke_callback:
> Invoking the PE: query=8981, ref=pe_calc-dc-1326801106-8978, seq=560,
> quorate=1
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: unpack_config: On loss
> of CCM Quorum: Ignore
> Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_config: Node
> scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_status: Node
> ujboss1 is in standby-mode
> Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status:
> Node ujboss1 is standby
> Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status:
> Node ujboss2 is online
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: group_print:  Resource
> Group: vifGroup
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print:     
> clusterIP   (ocf::heartbeat:IPaddr2):       Started ujboss1
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print:     
> routing-jboss       (lsb:routing-jboss):    Started ujboss1
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp:  Start
> recurring monitor (30s) for clusterIP on ujboss2
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp:  Start
> recurring monitor (30s) for routing-jboss on ujboss2
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move
> resource clusterIP    (Started ujboss1 -> ujboss2)
> Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move
> resource routing-jboss        (Started ujboss1 -> ujboss2)
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked
> transition 8777: 11 actions in 11 synapses
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_te_invoke: Processing
> graph 8777 (ref=pe_calc-dc-1326801106-8978) derived from
> /var/lib/pengine/pe-input-7830.bz2
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
> action 15 fired and confirmed
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating
> action 10: stop routing-jboss_stop_0 on ujboss1 (local)
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation
> monitor[79] on lsb::routing-jboss::routing-jboss for client 28369, its
> parameters: CRM_meta_interval=[30000] CRM_meta_timeout=[20000]
> crm_feature_set=[3.0.1] CRM_meta_name=[monitor]  cancelled
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing
> key=10:8777:0:39671e48-9519-4b61-b781-2efcd379df7a
> op=routing-jboss_stop_0 )
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:routing-jboss:80: stop
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
> operation routing-jboss_monitor_30000 (call=79, status=1, cib-update=0,
> confirmed=true) Cancelled
> Jan 17 12:51:48 ujboss1 lrmd: [5533]: WARN: For LSB init script, no
> additional parameters are needed.
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (routing-jboss:stop:stdout) Disabling traffic redirection from
> 128.1.13.9 to 128.1.13.7
> Jan 17 12:51:48 ujboss1 pengine: [28368]: info: process_pe_message:
> Transition 8777: PEngine Input stored in:
> /var/lib/pengine/pe-input-7830.bz2
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (routing-jboss:stop:stdout) [
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (routing-jboss:stop:stdout)   OK
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (routing-jboss:stop:stdout) ]
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (routing-jboss:stop:stdout)
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (routing-jboss:stop:stdout)
> 
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
> operation routing-jboss_stop_0 (call=80, rc=0, cib-update=8982,
> confirmed=true) ok
> Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
> connected to AIS
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action
> routing-jboss_stop_0 (10) confirmed on ujboss1 (rc=0)
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating
> action 7: stop clusterIP_stop_0 on ujboss1 (local)
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation
> monitor[77] on ocf::IPaddr2::clusterIP for client 28369, its parameters:
> CRM_meta_interval=[30000] ip=[128.1.13.9] cidr_netmask=[32]
> CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor]
> iflabel=[jbossfailover]  cancelled
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing
> key=7:8777:0:39671e48-9519-4b61-b781-2efcd379df7a op=clusterIP_stop_0 )
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:clusterIP:81: stop
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
> operation clusterIP_monitor_30000 (call=77, status=1, cib-update=0,
> confirmed=true) Cancelled
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (clusterIP:stop:stderr) logger: unknown facility name: none.
> 
> Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output:
> (clusterIP:stop:stderr) logger: unknown facility name: none.
> 
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM
> operation clusterIP_stop_0 (call=81, rc=0, cib-update=8983,
> confirmed=true) ok
> Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
> connected to AIS
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action
> clusterIP_stop_0 (7) confirmed on ujboss1 (rc=0)
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
> action 16 fired and confirmed
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
> action 3 fired and confirmed
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo
> action 13 fired and confirmed
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating
> action 8: start clusterIP_start_0 on ujboss2
> Jan 17 12:51:48 corosync [pcmk  ] notice: pcmk_peer_update: Transitional
> membership event on ring 568: memb=1, new=0, lost=1
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: memb: ujboss1
> 34406784
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: lost: ujboss2
> 51184000
> Jan 17 12:51:48 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> membership event on ring 568: memb=2, new=1, lost=0
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: NEW:  ujboss2
> 51184000
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: MEMB: ujboss1
> 34406784
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: MEMB: ujboss2
> 51184000
> Jan 17 12:51:48 ujboss1 crmd: [28369]: ERROR: crmd_ha_msg_filter:
> Another DC detected: ujboss2 (op=noop)
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
> Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff
> 0.233.5 -> 0.233.6 not applied to 0.234.3: current "epoch" is greater
> than required
> Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff
> 0.233.6 -> 0.233.7 not applied to 0.234.3: current "epoch" is greater
> than required
> Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff
> 0.233.7 -> 0.234.1 not applied to 0.234.3: current "epoch" is greater
> than required
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_dc_takeover: Taking over
> DC status for this partition
> Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We
> are now in R/O mode
> Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_slave_all for section 'all'
> (origin=local/crmd/8984, version=0.234.3): ok (rc=0)
> Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We
> are now in R/W mode
> Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_master for section 'all'
> (origin=local/crmd/8985, version=0.234.3): ok (rc=0)
> Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_modify for section cib
> (origin=local/crmd/8986, version=0.234.3): ok (rc=0)
> Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/8988, version=0.234.3): ok (rc=0)
> Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/8990, version=0.234.3): ok (rc=0)
> Jan 17 12:51:49 corosync [MAIN  ] Completed service synchronization,
> ready to provide service.
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
> join-8: Waiting on 2 outstanding join acks
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership
> 568: quorum retained
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting
> expected votes to 2
> Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/8993, version=0.234.3): ok (rc=0)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback:
> Checking for expired actions every 900000ms
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback:
> Sending expected-votes=2 to corosync
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
> ujboss1 (3.0.1)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: check_join_state:
> do_dc_join_filter_offer: Membership changed since join started: 560 -> 568
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: join_make_offer: Making
> join offers based on membership 568
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
> join-9: Waiting on 2 outstanding join acks
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership
> 568: quorum retained
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting
> expected votes to 2
> Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/8996, version=0.234.3): ok (rc=0)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
> ujboss1 (3.0.1)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2
> cluster nodes responded to the join offer.
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize:
> join-9: Syncing the CIB from ujboss1 to the rest of the cluster
> Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
> connected to AIS
> Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request:
> Operation complete: op cib_sync for section 'all'
> (origin=local/crmd/8998, version=0.234.3): not connected (rc=-3)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback:
> Sync from ujboss1 resulted in an error: not connected
> Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input
> I_ELECTION_DC from finalize_sync_callback() received in state
> S_FINALIZE_JOIN
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
> join-10: Waiting on 2 outstanding join acks
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
> ujboss1 (3.0.1)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2
> cluster nodes responded to the join offer.
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize:
> join-10: Syncing the CIB from ujboss1 to the rest of the cluster
> Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
> connected to AIS
> Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request:
> Operation complete: op cib_sync for section 'all'
> (origin=local/crmd/9000, version=0.234.3): not connected (rc=-3)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback:
> Sync from ujboss1 resulted in an error: not connected
> Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input
> I_ELECTION_DC from finalize_sync_callback() received in state
> S_FINALIZE_JOIN
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all:
> join-11: Waiting on 2 outstanding join acks
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to
> ujboss1 (3.0.1)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State
> transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2
> cluster nodes responded to the join offer.
> Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize:
> join-11: Syncing the CIB from ujboss1 to the rest of the cluster
> Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not
> connected to AIS
> Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request:
> Operation complete: op cib_sync for section 'all'
> (origin=local/crmd/9002, version=0.234.3): not connected (rc=-3)
> Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback:
> Sync from ujboss1 resulted in an error: not connected
> Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input
> I_ELECTION_DC from finalize_sync_callback() received in state
> S_FINALIZE_JOIN
> .... loop too
> 
> after restarting corosync :
> 
> 17/01/12 13H10 : crm_mon -1
> ============
> Last updated: Tue Jan 17 13:10:39 2012
> Stack: openais
> Current DC: ujboss1 - partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> ============
> 
> Online: [ ujboss1 ujboss2 ]
> 
>  Resource Group: vifGroup
>      clusterIP    (ocf::heartbeat:IPaddr2):    Started ujboss2 FAILED
>      routing-jboss    (lsb:routing-jboss):    Stopped
> 
> Failed actions:
>     clusterIP_start_0 (node=ujboss2, call=-1, rc=1, status=Timed Out):
> unknown error
> 
> 
> 
> Both linux servers were very busy, crmd, cib and corosync using all the
> cpu.
> Best regards,
> Philippe
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120119/e3b34db0/attachment-0003.sig>


More information about the Pacemaker mailing list