[ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
Digimer
lists at alteeve.ca
Sun Aug 7 01:59:31 UTC 2016
On 06/08/16 08:22 PM, Dan Swartzendruber wrote:
> On 2016-08-06 19:46, Digimer wrote:
>> On 06/08/16 07:33 PM, Dan Swartzendruber wrote:
>>>
>>> Okay, I almost have this all working. fence_ipmilan for the supermicro
>>> host. Had to specify lanplus for it to work. fence_drac5 for the R905.
>>> That was failing to complete due to timeout. Found a couple of helpful
>>> posts that recommended increase the retry count to 3 and the timeout to
>>> 60. That worked also. The only problem now, is that it takes well over
>>> a minute to complete the fencing operation. In that interim, the fenced
>>> host shows as UNCLEAN (offline), and because the fencing operation
>>> hasn't completed, the other node has to wait to import the pool and
>>> share out the filesystem. This causes the vsphere hosts to declare the
>>> NFS datastore down. I hadn't gotten exact timing, but I think the
>>> fencing operation took a little over a minute. I'm wondering if I could
>>> change the timeout to a smaller value, but increase the retries? Like
>>> back to the default 20 second timeout, but change retries from 1 to 5?
>>
>> Did you try the fence_ipmilan against the DRAC? It *should* work. Would
>> be interesting to see if it had the same issue. Can you check the DRAC's
>> host's power state using ipmitool directly without delay?
>
> Yes, I did try fence_ipmilan, but it got the timeout waiting for power
> off (or whatever). I have to admit, I switched to fence_drac and had
> the same issue, but after increasing the timeout and retries, got it to
> work, so it is possible, that fence_ipmilan is okay. They both seemed
> to take more than 60 seconds to complete the operation. I have to say
> that when I do a power cycle through the drac web interface, it takes
> awhile, so that might be normal. I think I will try again with 20
> seconds and 5 retries and see how that goes...
What about using ipmitool directly? I can't imagine that such a long
time is normal. Maybe there is a firmware update for the DRAC and/or
BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and
BIOS together).
Over a minute to fence is, strictly speaking, OK. However, that's a
significant delay in time to recover.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Users
mailing list