[Pacemaker] Question on ILO stonith resource config and restarting

Aaron Bush abush at microcenter.com
Wed Oct 29 21:08:13 EDT 2008


>> The next issue I have is that when I disconnect the LAN cable on a
>> single node that connects it to the rest of the network the clone
>> stonith monitor will fail since it can't connect to the other nodes
ILO
>> for status.  After some time (minutes let's say) I reconnect the LAN
>> cable but never see the clone stonith come back to life, just stays
>> failed.  What should I be looking at to make sure that the clone
stonith
>> restarts properly.
>>   
>There is something I don't understand: You cut the network connection
to 
>the ILO only?
>If yes, than the monitor action of the stonith plugin gets a failure
and 
>you can/have to react
>on that as usual. In your case: Ignore the failue and try again
further. 
>You can't do something better,
>besides to raise redundancy, IMHO.

Sorry, should have been more clear on the network cabling and what is
being removed...

The systems each have 3 NICs connected as follows:
NIC1) Cross over cable between both nodes.  Using this for Heartbeats.
This stays connected.
NIC2) Connects system to rest of network.  Also using this for
Heartbeats.  This is the NIC that the cable was removed from.  So this
system was no longer able to connect to the other nodes ILO port to get
status.  Pingd is monitoring upstream IPs via this NIC.
NIC3) ILO port.  This stays connected.  Connections are only made into
the ILO since it is not available as a network interface to the OS.

Since I still had heartbeating across the cross-over cable between each
node I am fine with the fact that the node with less connectivity did
not get shot.
The pingd scoring appeared to work well and my LVS/IP resource group
relocated to the other, more well connected, node. 
I am mostly concerned that I ended up with a node that had no associated
stonith resource available to shoot it if it was truly down since the
resource did not restart like I thought it should once the network cable
was reconnected.

Thanks,
-ab




More information about the Pacemaker mailing list