[ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

Digimer lists at alteeve.ca
Sun Aug 7 01:59:31 UTC 2016


On 06/08/16 08:22 PM, Dan Swartzendruber wrote:
> On 2016-08-06 19:46, Digimer wrote:
>> On 06/08/16 07:33 PM, Dan Swartzendruber wrote:
>>>
>>> Okay, I almost have this all working.  fence_ipmilan for the supermicro
>>> host.  Had to specify lanplus for it to work.  fence_drac5 for the R905.
>>>  That was failing to complete due to timeout.  Found a couple of helpful
>>> posts that recommended increase the retry count to 3 and the timeout to
>>> 60.  That worked also.  The only problem now, is that it takes well over
>>> a minute to complete the fencing operation.  In that interim, the fenced
>>> host shows as UNCLEAN (offline), and because the fencing operation
>>> hasn't completed, the other node has to wait to import the pool and
>>> share out the filesystem.  This causes the vsphere hosts to declare the
>>> NFS datastore down.  I hadn't gotten exact timing, but I think the
>>> fencing operation took a little over a minute.  I'm wondering if I could
>>> change the timeout to a smaller value, but increase the retries?  Like
>>> back to the default 20 second timeout, but change retries from 1 to 5?
>>
>> Did you try the fence_ipmilan against the DRAC? It *should* work. Would
>> be interesting to see if it had the same issue. Can you check the DRAC's
>> host's power state using ipmitool directly without delay?
> 
> Yes, I did try fence_ipmilan, but it got the timeout waiting for power
> off (or whatever).  I have to admit, I switched to fence_drac and had
> the same issue, but after increasing the timeout and retries, got it to
> work, so it is possible, that fence_ipmilan is okay.  They both seemed
> to take more than 60 seconds to complete the operation.  I have to say
> that when I do a power cycle through the drac web interface, it takes
> awhile, so that might be normal.  I think I will try again with 20
> seconds and 5 retries and see how that goes...

What about using ipmitool directly? I can't imagine that such a long
time is normal. Maybe there is a firmware update for the DRAC and/or
BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and
BIOS together).

Over a minute to fence is, strictly speaking, OK. However, that's a
significant delay in time to recover.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list