[Pacemaker] Beginner Fencing Help

imnotpc imnotpc at rock3d.net
Mon Jun 6 10:02:24 EDT 2011


On Monday, June 06, 2011 03:11:24 Errol Neal wrote:
> On Fri, 06/03/2011 12:31 PM, imnotpc <imnotpc at rock3d.net> wrote:
> > I have a working 3 node cluster with a couple of resources defined. If I
> > shutdown a node crm_mon shows the cluster correctly identifies the node,
> > marks it as offline, and moves any resources on it. The fencing resource
> > (I've tried both ssh and meatware) also sees it as down and marks it
> > stopped. So far so good. I was expecting a console warning or a shutdown
> > attempt but nothing happens. I checked the logs and can see that stonith
> > sees the event but I don't see any actions taken. "crm_verify -L"
> > doesn't show any problems. What else should I do to
> > troubleshoot/configure this?
> 
> You should probably begin by posting your config so we can have some
> additional context. What stonith devices do you have configured?

Right now I have meatware as the stonith device.

<?xml version="1.0" ?>
<cib admin_epoch="0" cib-last-written="Mon Jun  6 08:45:09 2011" 
crm_feature_set="3.0.5" dc-uuid="JeffDesk.LAN" epoch="17" have-quorum="1" 
num_updates="81" validate-with="pacemaker-1.2">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
value="1.1.5-1.fc15-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" 
name="cluster-infrastructure" value="openais"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" 
name="expected-quorum-votes" value="3"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-
enabled" value="true"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="Server4.LAN" type="normal" uname="Server4.LAN"/>
      <node id="JeffDesk.LAN" type="normal" uname="JeffDesk.LAN"/>
      <node id="Server2.LAN" type="normal" uname="Server2.LAN"/>
    </nodes>
    <resources>
      <primitive class="ocf" id="ClusterIP" provider="heartbeat" 
type="IPaddr2">
        <instance_attributes id="ClusterIP-instance_attributes">
          <nvpair id="ClusterIP-instance_attributes-ip" name="ip" 
value="192.168.0.200"/>
          <nvpair id="ClusterIP-instance_attributes-cidr_netmask" 
name="cidr_netmask" value="32"/>
        </instance_attributes>
        <operations>
          <op id="ClusterIP-monitor-30s" interval="30s" name="monitor"/>
        </operations>
      </primitive>
      <clone id="Fencing">
        <primitive class="stonith" id="meatware-fence" type="meatware">
          <instance_attributes id="meatware-fence-instance_attributes">
            <nvpair id="meatware-fence-instance_attributes-hostlist" 
name="hostlist" value="JeffDesk.LAN Server2.LAN Server4.LAN"/>
          </instance_attributes>
        </primitive>
      </clone>
    </resources>
    <constraints/>
  </configuration>
</cib>

When I shutdown a node I see this in the logs:

[...]

Jun  6 09:53:00 Server2 crmd: [2362]: info: handle_shutdown_request: Creating 
shutdown request for Server4.LAN (state=S_IDLE)
Jun  6 09:53:00 Server2 crmd: [2362]: info: abort_transition_graph: 
te_update_diff:149 - Triggered transition abort (complete=1, tag=nvpair, 
id=status-Server4.LAN-shutdown, magic=NA, cib=0.17.208) : Transient attribute: 
update
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: All 3 cluster 
nodes are eligible to run resources.
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_pe_invoke: Query 84: Requesting 
the current CIB: S_POLICY_ENGINE
Jun  6 09:53:00 Server2 pengine: [2361]: notice: native_print: 
ClusterIP#011(ocf::heartbeat:IPaddr2):#011Started Server2.LAN
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_pe_invoke_callback: Invoking 
the PE: query=84, ref=pe_calc-dc-1307368380-52, seq=252, quorate=1
Jun  6 09:53:00 Server2 pengine: [2361]: notice: clone_print:  Clone Set: 
Fencing [meatware-fence]
Jun  6 09:53:00 Server2 pengine: [2361]: notice: short_print:      Started: [ 
Server2.LAN JeffDesk.LAN Server4.LAN ]
Jun  6 09:53:00 Server2 pengine: [2361]: notice: stage6: Scheduling Node 
Server4.LAN for shutdown
Jun  6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave   
ClusterIP#011(Started Server2.LAN)
Jun  6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave   meatware-
fence:0#011(Started Server2.LAN)
Jun  6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Leave   meatware-
fence:1#011(Started JeffDesk.LAN)
Jun  6 09:53:00 Server2 pengine: [2361]: notice: LogActions: Stop    meatware-
fence:2#011(Server4.LAN)
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jun  6 09:53:00 Server2 crmd: [2362]: info: unpack_graph: Unpacked transition 
4: 4 actions in 4 synapses
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_te_invoke: Processing graph 4 
(ref=pe_calc-dc-1307368380-52) derived from /var/lib/pengine/pe-input-67.bz2
Jun  6 09:53:00 Server2 crmd: [2362]: info: te_pseudo_action: Pseudo action 16 
fired and confirmed
Jun  6 09:53:00 Server2 crmd: [2362]: info: te_rsc_command: Initiating action 
13: stop meatware-fence:2_stop_0 on Server4.LAN
Jun  6 09:53:00 Server2 crmd: [2362]: info: match_graph_event: Action 
meatware-fence:2_stop_0 (13) confirmed on Server4.LAN (rc=0)
Jun  6 09:53:00 Server2 crmd: [2362]: info: te_pseudo_action: Pseudo action 17 
fired and confirmed
Jun  6 09:53:00 Server2 crmd: [2362]: info: te_crm_command: Executing crm-
event (20): do_shutdown on Server4.LAN
Jun  6 09:53:00 Server2 crmd: [2362]: info: run_graph: 
====================================================
Jun  6 09:53:00 Server2 crmd: [2362]: notice: run_graph: Transition 4 
(Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-67.bz2): Complete
Jun  6 09:53:00 Server2 crmd: [2362]: info: te_graph_trigger: Transition 4 is 
now complete
Jun  6 09:53:00 Server2 crmd: [2362]: info: notify_crmd: Transition 4 status: 
done - <null>
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun  6 09:53:00 Server2 crmd: [2362]: info: do_state_transition: Starting 
PEngine Recheck Timer
Jun  6 09:53:00 Server2 pacemakerd: [2353]: info: update_node_processes: Node 
Server4.LAN now has process list: 00000000000000000000000000111112 (was 
00000000000000000000000000111312)
Jun  6 09:53:00 Server2 stonith-ng: [2357]: info: crm_update_peer: Node 
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000111112 (new)
Jun  6 09:53:00 Server2 attrd: [2360]: info: crm_update_peer: Node 
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000111112 (new)
Jun  6 09:53:00 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000111112 (new)
Jun  6 09:53:00 Server2 crmd: [2362]: notice: crmd_peer_update: Status update: 
Client Server4.LAN/crmd now has status [offline] (DC=true)
Jun  6 09:53:00 Server2 crmd: [2362]: info: erase_node_from_join: Removed node 
Server4.LAN from join calculations: welcomed=0 itegrated=0 finalized=0 
confirmed=1
Jun  6 09:53:00 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000111112 (new)
Jun  6 09:53:00 Server2 pacemakerd: [2353]: info: update_node_processes: Node 
Server4.LAN now has process list: 00000000000000000000000000101112 (was 
00000000000000000000000000111112)
Jun  6 09:53:00 Server2 stonith-ng: [2357]: info: crm_update_peer: Node 
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000101112 (new)
Jun  6 09:53:00 Server2 attrd: [2360]: info: crm_update_peer: Node 
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000101112 (new)
Jun  6 09:53:00 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000101112 (new)
Jun  6 09:53:00 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000101112 (new)
Jun  6 09:53:01 Server2 pacemakerd: [2353]: info: update_node_processes: Node 
Server4.LAN now has process list: 00000000000000000000000000100112 (was 
00000000000000000000000000101112)
Jun  6 09:53:01 Server2 attrd: [2360]: info: crm_update_peer: Node 
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000100112 (new)
Jun  6 09:53:01 Server2 stonith-ng: [2357]: info: crm_update_peer: Node 
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000100112 (new)
Jun  6 09:53:01 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000100112 (new)
Jun  6 09:53:01 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000100112 (new)
Jun  6 09:53:01 Server2 pacemakerd: [2353]: info: update_node_processes: Node 
Server4.LAN now has process list: 00000000000000000000000000100102 (was 
00000000000000000000000000100112)
Jun  6 09:53:01 Server2 attrd: [2360]: info: crm_update_peer: Node 
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000100102 (new)
Jun  6 09:53:01 Server2 stonith-ng: [2357]: info: crm_update_peer: Node 
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000100102 (new)
Jun  6 09:53:01 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000100102 (new)
Jun  6 09:53:01 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000100102 (new)
Jun  6 09:53:01 Server2 cib: [2358]: info: cib_process_shutdown_req: Shutdown 
REQ from Server4.LAN
Jun  6 09:53:01 Server2 cib: [2358]: info: cib_process_request: Operation 
complete: op cib_shutdown_req for section 'all' 
(origin=Server4.LAN/Server4.LAN/(null), version=0.17.210): ok (rc=0)
Jun  6 09:53:06 Server2 pacemakerd: [2353]: info: update_node_processes: Node 
Server4.LAN now has process list: 00000000000000000000000000100002 (was 
00000000000000000000000000100102)
Jun  6 09:53:06 Server2 stonith-ng: [2357]: info: crm_update_peer: Node 
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000100002 (new)
Jun  6 09:53:06 Server2 attrd: [2360]: info: crm_update_peer: Node 
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000100002 (new)
Jun  6 09:53:06 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000100002 (new)
Jun  6 09:53:06 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000100002 (new)
Jun  6 09:53:06 Server2 pacemakerd: [2353]: info: update_node_processes: Node 
Server4.LAN now has process list: 00000000000000000000000000000002 (was 
00000000000000000000000000100002)
Jun  6 09:53:06 Server2 attrd: [2360]: info: crm_update_peer: Node 
Server4.LAN: id=67152064 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000000002 (new)
Jun  6 09:53:06 Server2 stonith-ng: [2357]: info: crm_update_peer: Node 
Server4.LAN: id=0 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00000000000000000000000000000002 (new)
Jun  6 09:53:06 Server2 crmd: [2362]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000000002 (new)
Jun  6 09:53:06 Server2 cib: [2358]: info: crm_update_peer: Node Server4.LAN: 
id=67152064 state=member addr=r(0) ip(192.168.0.4)  votes=1 born=244 seen=252 
proc=00000000000000000000000000000002 (new)

[...]

The reference to pseudo actions seems suspicious.

Jeff




More information about the Pacemaker mailing list