[Pacemaker] Stuck in a STONITH cycle

Andrew Beekhof andrew at beekhof.net
Tue Oct 16 04:03:58 UTC 2012


On Sat, Oct 13, 2012 at 7:58 AM, David Parker <dparker at utica.edu> wrote:
> Hello,
>
> I have two nodes set up in a cluster to provide a MySQL server (mysqld) in
> HA on a virtual IP address.  This was working fine until I had to reboot the
> servers.  All I did was change the interface each node uses for its primary
> IP address (changed from eth1 to eth0 on each node).  Now I'm stuck in a
> cycle.  Let's say node 1 has the virtual IP and is running mysqld, and node
> 2 is down.  When node 2 boots up, it will STONITH node 1 for no apparent
> reason

At a guess... because it still cant reach node 1?

> and take over the resources, which shouldn't happen.

Why?  This would be expected in a 2-node cluster where either side
comes up while the network is hosed.

What you have is a stonith death match: http://ourobengr.com/ha/

>  When node 1
> boots up again, it will STONITH node 2 and take over the resources, which
> again shouldn't happen.
>
> Obviously, the goal is to have both nodes running, and whichever one boots
> up second should see that the first one has the resources and then stand by.
> I'll include the relevant log entries and my crm config at the end of this
> message.  In the logs, the only thing I see that seems unusual (other than
> the fact that a STONITH is executed against a running node) is this:
>
> Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha: Requesting
> the list of configured nodes
> Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not calculate UUID
> for ha2
> Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha: Node ha2: no
> uuid found
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster
> nodes are eligible to run resources.
>
> The exact opposite shows up on the node "ha2" (it says ha1 has no uuid).
> Did the UUID of each node change because the physical interface changed?
> Any other ideas?  Thanks in advance.
>
>     - Dave
>
> Here are the log lines:
>
> Oct 12 16:26:13 ha1 heartbeat: [1176]: info: Starting
> "/usr/lib/heartbeat/crmd" as uid 103  gid 105 (pid 1176)
> Oct 12 16:26:13 ha1 crmd: [1176]: info: crm_log_init: Changed active
> directory to /var/lib/heartbeat/cores/hacluster
> Oct 12 16:26:13 ha1 crmd: [1176]: info: Invoked: /usr/lib/heartbeat/crmd
> Oct 12 16:26:13 ha1 crmd: [1176]: info: main: CRM Hg Version:
> 74392a28b7f31d7ddc86689598bd23114f58978b
> Oct 12 16:26:13 ha1 crmd: [1176]: info: crmd_init: Starting crmd
> Oct 12 16:26:13 ha1 crmd: [1176]: info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> Oct 12 16:26:14 ha1 crmd: [1176]: info: do_cib_control: CIB connection
> established
> Oct 12 16:26:14 ha1 crmd: [1176]: info: crm_cluster_connect: Connecting to
> Heartbeat
> Oct 12 16:26:15 ha1 crmd: [1176]: info: register_heartbeat_conn: Hostname:
> ha1
> Oct 12 16:26:15 ha1 crmd: [1176]: info: register_heartbeat_conn: UUID:
> 1b48f410-44d1-4e89-8b52-ff23b32db1bc
> Oct 12 16:26:15 ha1 crmd: [1176]: info: do_ha_control: Connected to the
> cluster
> Oct 12 16:26:15 ha1 crmd: [1176]: info: do_ccm_control: CCM connection
> established... waiting for first callback
> Oct 12 16:26:15 ha1 crmd: [1176]: info: do_started: Delaying start, CCM
> (0000000000100000) not connected
> Oct 12 16:26:15 ha1 crmd: [1176]: info: crmd_init: Starting crmd's mainloop
> Oct 12 16:26:15 ha1 crmd: [1176]: info: config_query_callback: Checking for
> expired actions every 900000ms
> Oct 12 16:26:15 ha1 crmd: [1176]: notice: crmd_client_status_callback:
> Status update: Client ha1/crmd now has status [online] (DC=false)
> Oct 12 16:26:15 ha1 crmd: [1176]: info: crm_new_peer: Node 0 is now known as
> ha1
> Oct 12 16:26:15 ha1 crmd: [1176]: info: crm_update_peer_proc: ha1.crmd is
> now online
> Oct 12 16:26:15 ha1 crmd: [1176]: info: crmd_client_status_callback: Not the
> DC
> Oct 12 16:26:15 ha1 crmd: [1176]: notice: crmd_client_status_callback:
> Status update: Client ha1/crmd now has status [online] (DC=false)
> Oct 12 16:26:16 ha1 crmd: [1176]: info: crmd_client_status_callback: Not the
> DC
> Oct 12 16:26:16 ha1 crmd: [1176]: info: mem_handle_event: Got an event
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Oct 12 16:26:16 ha1 crmd: [1176]: info: mem_handle_event: instance=1,
> nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
> Oct 12 16:26:16 ha1 crmd: [1176]: info: crmd_ccm_msg_callback: Quorum
> (re)attained after event=NEW MEMBERSHIP (id=1)
> Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: NEW MEMBERSHIP:
> trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0, old_idx=3
> Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: #011CURRENT: ha1
> [nodeid=0, born=1]
> Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: #011NEW:     ha1
> [nodeid=0, born=1]
> Oct 12 16:26:16 ha1 crmd: [1176]: info: crm_update_peer: Node ha1: id=0
> state=member (new) addr=(null) votes=-1 born=1 seen=1
> proc=00000000000000000000000000000200
> Oct 12 16:26:16 ha1 crmd: [1176]: info: crm_update_peer_proc: ha1.ais is now
> online
> Oct 12 16:26:16 ha1 crmd: [1176]: info: do_started: The local CRM is
> operational
> Oct 12 16:26:16 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL
> origin=do_started ]
> Oct 12 16:27:17 ha1 crmd: [1176]: info: crm_timer_popped: Election Trigger
> (I_DC_TIMEOUT) just popped!
> Oct 12 16:27:17 ha1 crmd: [1176]: WARN: do_log: FSA: Input I_DC_TIMEOUT from
> crm_timer_popped() received in state S_PENDING
> Oct 12 16:27:17 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED
> origin=crm_timer_popped ]
> Oct 12 16:27:17 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Oct 12 16:27:17 ha1 crmd: [1176]: info: do_te_control: Registering TE UUID:
> 4f352feb-6be6-4af1-bb9e-af8b7ca28f58
> Oct 12 16:27:17 ha1 crmd: [1176]: WARN: cib_client_add_notify_callback:
> Callback already present
> Oct 12 16:27:17 ha1 crmd: [1176]: info: set_graph_functions: Setting custom
> graph functions
> Oct 12 16:27:17 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition
> -1: 0 actions in 0 synapses
> Oct 12 16:27:17 ha1 crmd: [1176]: info: start_subsystem: Starting sub-system
> "pengine"
> Oct 12 16:27:20 ha1 crmd: [1176]: info: do_dc_takeover: Taking over DC
> status for this partition
> Oct 12 16:27:20 ha1 crmd: [1176]: info: join_make_offer: Making join offers
> based on membership 1
> Oct 12 16:27:20 ha1 crmd: [1176]: info: do_dc_join_offer_all: join-1:
> Waiting on 1 outstanding join acks
> Oct 12 16:27:20 ha1 crmd: [1176]: info: te_connect_stonith: Attempting
> connection to fencing daemon...
> Oct 12 16:27:21 ha1 crmd: [1176]: info: te_connect_stonith: Connected
> Oct 12 16:27:21 ha1 crmd: [1176]: info: config_query_callback: Checking for
> expired actions every 900000ms
> Oct 12 16:27:21 ha1 crmd: [1176]: info: update_dc: Set DC to ha1 (3.0.1)
> Oct 12 16:27:21 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Oct 12 16:27:21 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster
> nodes responded to the join offer.
> Oct 12 16:27:21 ha1 crmd: [1176]: info: do_dc_join_finalize: join-1: Syncing
> the CIB from ha1 to the rest of the cluster
> Oct 12 16:27:22 ha1 crmd: [1176]: info: update_attrd: Connecting to attrd...
> Oct 12 16:27:22 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of
> "//node_state[@uname='ha1']/transient_attributes": ok (rc=0)
> Oct 12 16:27:22 ha1 crmd: [1176]: info: do_dc_join_ack: join-1: Updating
> node state to member for ha1
> Oct 12 16:27:22 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of
> "//node_state[@uname='ha1']/lrm": ok (rc=0)
> Oct 12 16:27:22 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha: Requesting
> the list of configured nodes
> Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not calculate UUID
> for ha2
> Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha: Node ha2: no
> uuid found
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster
> nodes are eligible to run resources.
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_dc_join_final: Ensuring DC,
> quorum and node attributes are up-to-date
> Oct 12 16:27:23 ha1 crmd: [1176]: info: crm_update_quorum: Updating quorum
> status to true (call=21)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph:
> do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke: Query 22: Requesting
> the current CIB: S_POLICY_ENGINE
> Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=1) : Non-status change
> Oct 12 16:27:23 ha1 crmd: [1176]: info: need_abort: Aborting on change to
> admin_epoch
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke: Query 23: Requesting
> the current CIB: S_POLICY_ENGINE
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking the
> PE: query=23, ref=pe_calc-dc-1350073643-7, seq=1, quorate=1
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Oct 12 16:27:23 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition 0:
> 20 actions in 20 synapses
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 0
> (ref=pe_calc-dc-1350073643-7) derived from /var/lib/pengine/pe-warn-1370.bz2
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 4:
> monitor stonith-ha1_monitor_0 on ha1 (local)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=4:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha1_monitor_0 )
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 5:
> monitor stonith-ha2_monitor_0 on ha1 (local)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=5:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha2_monitor_0 )
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 6:
> monitor MysqlIP_monitor_0 on ha1 (local)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=6:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_monitor_0 )
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 7:
> monitor mysqld_monitor_0 on ha1 (local)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=7:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=mysqld_monitor_0 )
> Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> stonith-ha1_monitor_0 (call=2, rc=7, cib-update=24, confirmed=true) not
> running
> Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> stonith-ha2_monitor_0 (call=3, rc=7, cib-update=25, confirmed=true) not
> running
> Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
> stonith-ha1_monitor_0 (4) confirmed on ha1 (rc=0)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
> stonith-ha2_monitor_0 (5) confirmed on ha1 (rc=0)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> mysqld_monitor_0 (call=5, rc=7, cib-update=26, confirmed=true) not running
> Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
> mysqld_monitor_0 (7) confirmed on ha1 (rc=0)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> MysqlIP_monitor_0 (call=4, rc=7, cib-update=27, confirmed=true) not running
> Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action
> MysqlIP_monitor_0 (6) confirmed on ha1 (rc=0)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 3:
> probe_complete probe_complete on ha1 (local) - no waiting
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 2
> fired and confirmed
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 8:
> start stonith-ha2_start_0 on ha1 (local)
> Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=8:0:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha2_start_0 )
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 14
> fired and confirmed
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 20
> fired and confirmed
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 19
> fired and confirmed
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 15
> fired and confirmed
> Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 12
> fired and confirmed
> Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph:
> te_update_diff:146 - Triggered transition abort (complete=0,
> tag=transient_attributes, id=1b48f410-44d1-4e89-8b52-ff23b32db1bc, magic=NA,
> cib=0.61.6) : Transient attribute:
>  update
> Oct 12 16:27:23 ha1 crmd: [1176]: info: update_abort_priority: Abort
> priority upgraded from 0 to 1000000
> Oct 12 16:27:23 ha1 crmd: [1176]: info: update_abort_priority: Abort action
> done superceeded by restart
> Oct 12 16:27:34 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> stonith-ha2_start_0 (call=6, rc=0, cib-update=28, confirmed=true) ok
> Oct 12 16:27:34 ha1 crmd: [1176]: info: match_graph_event: Action
> stonith-ha2_start_0 (8) confirmed on ha1 (rc=0)
> Oct 12 16:27:34 ha1 crmd: [1176]: info: run_graph:
> ====================================================
> Oct 12 16:27:34 ha1 crmd: [1176]: notice: run_graph: Transition 0
> (Complete=12, Pending=0, Fired=0, Skipped=8, Incomplete=0,
> Source=/var/lib/pengine/pe-warn-1370.bz2): Stopped
> Oct 12 16:27:34 ha1 crmd: [1176]: info: te_graph_trigger: Transition 0 is
> now complete
> Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster
> nodes are eligible to run resources.
> Oct 12 16:27:34 ha1 crmd: [1176]: info: do_pe_invoke: Query 29: Requesting
> the current CIB: S_POLICY_ENGINE
> Oct 12 16:27:34 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking the
> PE: query=29, ref=pe_calc-dc-1350073654-14, seq=1, quorate=1
> Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Oct 12 16:27:34 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition 1:
> 9 actions in 9 synapses
> Oct 12 16:27:34 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 1
> (ref=pe_calc-dc-1350073654-14) derived from
> /var/lib/pengine/pe-warn-1371.bz2
> Oct 12 16:27:34 ha1 stonithd: [1174]: info: client tengine [pid: 1176]
> requests a STONITH operation RESET on node ha2
> Oct 12 16:27:34 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 9
> fired and confirmed
> Oct 12 16:27:34 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 13
> fired and confirmed
> Oct 12 16:27:34 ha1 crmd: [1176]: info: te_fence_node: Executing reboot
> fencing operation (15) on ha2 (timeout=60000)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: tengine_stonith_callback: call=1266,
> optype=1, node_name=ha2, result=0, node_list=ha1,
> action=15:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58
> Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 1
> fired and confirmed
> Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 14
> fired and confirmed
> Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 6:
> start MysqlIP_start_0 on ha1 (local)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=6:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_start_0 )
> Oct 12 16:28:08 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of
> "//node_state[@uname='ha2']/lrm": ok (rc=0)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of
> "//node_state[@uname='ha2']/transient_attributes": ok (rc=0)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> MysqlIP_start_0 (call=7, rc=0, cib-update=33, confirmed=true) ok
> Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action
> MysqlIP_start_0 (6) confirmed on ha1 (rc=0)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 7:
> monitor MysqlIP_monitor_10000 on ha1 (local)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=7:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_monitor_10000 )
> Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 8:
> start mysqld_start_0 on ha1 (local)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing
> key=8:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=mysqld_start_0 )
> Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> mysqld_start_0 (call=9, rc=0, cib-update=34, confirmed=true) ok
> Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action
> mysqld_start_0 (8) confirmed on ha1 (rc=0)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 10
> fired and confirmed
> Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation
> MysqlIP_monitor_10000 (call=8, rc=0, cib-update=35, confirmed=false) ok
> Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action
> MysqlIP_monitor_10000 (7) confirmed on ha1 (rc=0)
> Oct 12 16:28:08 ha1 crmd: [1176]: info: run_graph:
> ====================================================
> Oct 12 16:28:08 ha1 crmd: [1176]: notice: run_graph: Transition 1
> (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-warn-1371.bz2): Complete
> Oct 12 16:28:08 ha1 crmd: [1176]: info: te_graph_trigger: Transition 1 is
> now complete
> Oct 12 16:28:08 ha1 crmd: [1176]: info: notify_crmd: Transition 1 status:
> done - <null>
> Oct 12 16:28:08 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Oct 12 16:28:08 ha1 crmd: [1176]: info: do_state_transition: Starting
> PEngine Recheck Timer
> Oct 12 16:43:08 ha1 crmd: [1176]: info: crm_timer_popped: PEngine Recheck
> Timer (I_PE_CALC) just popped!
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
> origin=crm_timer_popped ]
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: Progressed to
> state S_POLICY_ENGINE after C_TIMER_POPPED
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster
> nodes are eligible to run resources.
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_pe_invoke: Query 36: Requesting
> the current CIB: S_POLICY_ENGINE
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking the
> PE: query=36, ref=pe_calc-dc-1350074588-18, seq=1, quorate=1
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Oct 12 16:43:08 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition 2:
> 0 actions in 0 synapses
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 2
> (ref=pe_calc-dc-1350074588-18) derived from
> /var/lib/pengine/pe-input-4140.bz2
> Oct 12 16:43:08 ha1 crmd: [1176]: info: run_graph:
> ====================================================
> Oct 12 16:43:08 ha1 crmd: [1176]: notice: run_graph: Transition 2
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-4140.bz2): Complete
> Oct 12 16:43:08 ha1 crmd: [1176]: info: te_graph_trigger: Transition 2 is
> now complete
> Oct 12 16:43:08 ha1 crmd: [1176]: info: notify_crmd: Transition 2 status:
> done - <null>
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: Starting
> PEngine Recheck Timer
>
> Here's my config:
>
> root at ha1:~# cibadmin --query
> <cib validate-with="pacemaker-1.0" crm_feature_set="3.0.1" have-quorum="1"
> admin_epoch="0" epoch="61" num_updates="11" cib-last-written="Fri Oct 12
> 16:07:33 2012" dc-uuid="1b48f410-44d1-4e89-8b52-ff23b32db1bc">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b"/>
> <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="Heartbeat"/>
> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled"
> value="true"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="1b48f410-44d1-4e89-8b52-ff23b32db1bc" uname="ha1" type="normal"/>
> <node id="9790fe6e-67b2-4817-abf4-966b5aa6948c" uname="ha2" type="normal"/>
> </nodes>
> <resources>
> <primitive class="stonith" id="stonith-ha1" type="external/riloe">
> <instance_attributes id="stonith-ha1-instance_attributes">
> <nvpair id="stonith-ha1-instance_attributes-hostlist" name="hostlist"
> value="ha1"/>
> <nvpair id="stonith-ha1-instance_attributes-ilo_hostname"
> name="ilo_hostname" value="10.0.1.111"/>
> <nvpair id="stonith-ha1-instance_attributes-ilo_user" name="ilo_user"
> value="Administrator"/>
> <nvpair id="stonith-ha1-instance_attributes-ilo_password"
> name="ilo_password" value="xxx"/>
> <nvpair id="stonith-ha1-instance_attributes-ilo_can_reset"
> name="ilo_can_reset" value="1"/>
> <nvpair id="stonith-ha1-instance_attributes-ilo_protocol"
> name="ilo_protocol" value="2.0"/>
> <nvpair id="stonith-ha1-instance_attributes-ilo_powerdown_method"
> name="ilo_powerdown_method" value="button"/>
> </instance_attributes>
> </primitive>
> <primitive class="stonith" id="stonith-ha2" type="external/riloe">
> <instance_attributes id="stonith-ha2-instance_attributes">
> <nvpair id="stonith-ha2-instance_attributes-hostlist" name="hostlist"
> value="ha2"/>
> <nvpair id="stonith-ha2-instance_attributes-ilo_hostname"
> name="ilo_hostname" value="10.0.1.112"/>
> <nvpair id="stonith-ha2-instance_attributes-ilo_user" name="ilo_user"
> value="Administrator"/>
> <nvpair id="stonith-ha2-instance_attributes-ilo_password"
> name="ilo_password" value="xxx"/>
> <nvpair id="stonith-ha2-instance_attributes-ilo_can_reset"
> name="ilo_can_reset" value="1"/>
> <nvpair id="stonith-ha2-instance_attributes-ilo_protocol"
> name="ilo_protocol" value="2.0"/>
> <nvpair id="stonith-ha2-instance_attributes-ilo_powerdown_method"
> name="ilo_powerdown_method" value="button"/>
> </instance_attributes>
> </primitive>
> <group id="mysql-resources">
> <primitive class="ocf" id="MysqlIP" provider="heartbeat" type="IPaddr2">
> <instance_attributes id="MysqlIP-instance_attributes">
> <nvpair id="MysqlIP-instance_attributes-ip" name="ip" value="192.168.25.9"/>
> <nvpair id="MysqlIP-instance_attributes-cidr_netmask" name="cidr_netmask"
> value="32"/>
> </instance_attributes>
> <operations>
> <op id="MysqlIP-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive id="mysqld" class="lsb" type="mysqld"/>
> </group>
> </resources>
> <constraints>
> <rsc_location id="loc-3" rsc="stonith-ha1" node="ha1" score="-INFINITY"/>
> <rsc_location id="loc-4" rsc="stonith-ha2" node="ha2" score="-INFINITY"/>
> <rsc_location id="loc-5" rsc="mysql-resources" node="ha1" score="100"/>
> </constraints>
> <rsc_defaults/>
> <op_defaults/>
> </configuration>
> <status>
> <node_state id="1b48f410-44d1-4e89-8b52-ff23b32db1bc" uname="ha1"
> ha="active" in_ccm="true" crmd="online" join="member" expected="member"
> crm-debug-origin="do_update_resource" shutdown="0">
> <lrm id="1b48f410-44d1-4e89-8b52-ff23b32db1bc">
> <lrm_resources>
> <lrm_resource id="stonith-ha1" type="external/riloe" class="stonith">
> <lrm_rsc_op id="stonith-ha1_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="4:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:7;4:0:7:4f352feb-6be6-4
> af1-bb9e-af8b7ca28f58" call-id="2" rc-code="7" op-status="0" interval="0"
> last-run="1350073642" last-rc-change="1350073642" exec-time="10"
> queue-time="0" op-digest="f7c04bf4ee3eb60cf4f9e841449a0fe6"/>
> </lrm_resource>
> <lrm_resource id="stonith-ha2" type="external/riloe" class="stonith">
> <lrm_rsc_op id="stonith-ha2_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="5:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:7;5:0:7:4f352feb-6be6-4
> af1-bb9e-af8b7ca28f58" call-id="3" rc-code="7" op-status="0" interval="0"
> last-run="1350073643" last-rc-change="1350073643" exec-time="0"
> queue-time="0" op-digest="c8773e5caa7ae2a75f71a79adc582d54"/>
> <lrm_rsc_op id="stonith-ha2_start_0" operation="start"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="8:0:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:0;8:0:0:4f352feb-6be6-4af1-
> bb9e-af8b7ca28f58" call-id="6" rc-code="0" op-status="0" interval="0"
> last-run="1350073643" last-rc-change="1350073643" exec-time="10920"
> queue-time="0" op-digest="c8773e5caa7ae2a75f71a79adc582d54"/>
> </lrm_resource>
> <lrm_resource id="mysqld" type="mysqld" class="lsb">
> <lrm_rsc_op id="mysqld_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="7:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:7;7:0:7:4f352feb-6be6-4af1-b
> b9e-af8b7ca28f58" call-id="5" rc-code="7" op-status="0" interval="0"
> last-run="1350073642" last-rc-change="1350073642" exec-time="40"
> queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
> <lrm_rsc_op id="mysqld_start_0" operation="start"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="8:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:0;8:1:0:4f352feb-6be6-4af1-bb9e-
> af8b7ca28f58" call-id="9" rc-code="0" op-status="0" interval="0"
> last-run="1350073688" last-rc-change="1350073688" exec-time="0"
> queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
> </lrm_resource>
> <lrm_resource id="MysqlIP" type="IPaddr2" class="ocf" provider="heartbeat">
> <lrm_rsc_op id="MysqlIP_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="6:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:7;6:0:7:4f352feb-6be6-4af1-
> bb9e-af8b7ca28f58" call-id="4" rc-code="7" op-status="0" interval="0"
> last-run="1350073642" last-rc-change="1350073642" exec-time="150"
> queue-time="0" op-digest="9611b7026c2dc135fbd13d3537b42d16"/>
> <lrm_rsc_op id="MysqlIP_start_0" operation="start"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="6:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:0;6:1:0:4f352feb-6be6-4af1-bb9e
> -af8b7ca28f58" call-id="7" rc-code="0" op-status="0" interval="0"
> last-run="1350073687" last-rc-change="1350073687" exec-time="30"
> queue-time="0" op-digest="9611b7026c2dc135fbd13d3537b42d16"/>
> <lrm_rsc_op id="MysqlIP_monitor_10000" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
> transition-key="7:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58"
> transition-magic="0:0;7:1:0:4f352feb-6be6-4
> af1-bb9e-af8b7ca28f58" call-id="8" rc-code="0" op-status="0"
> interval="10000" last-run="1350073687" last-rc-change="1350073687"
> exec-time="10" queue-time="0" op-digest="caf3454e2233540200e7a6a5319c5362"/>
> </lrm_resource>
> </lrm_resources>
> </lrm>
> <transient_attributes id="1b48f410-44d1-4e89-8b52-ff23b32db1bc">
> <instance_attributes id="status-1b48f410-44d1-4e89-8b52-ff23b32db1bc">
> <nvpair id="status-1b48f410-44d1-4e89-8b52-ff23b32db1bc-probe_complete"
> name="probe_complete" value="true"/>
> </instance_attributes>
> </transient_attributes>
> </node_state>
> <node_state id="9790fe6e-67b2-4817-abf4-966b5aa6948c" uname="ha2" ha="dead"
> in_ccm="false" crmd="offline" join="down" expected="down"
> crm-debug-origin="send_stonith_update"/>
> </status>
> </cib>
>
> --
>
> Dave Parker
> Systems Administrator
> Utica College
> Integrated Information Technology Services
> (315) 792-3229
> Registered Linux User #408177
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list