[ClusterLabs] Fencing in a 3 node cluster

Digimer lists at alteeve.ca
Fri Jul 10 12:31:43 EDT 2015


On 10/07/15 12:39 AM, Nicolas S. wrote:
> 10 juillet 2015 08:46 "Digimer" <lists at alteeve.ca> a écrit:
> 
>> On 09/07/15 11:37 PM, Nicolas S. wrote:
>>
>>> Hello,
>>>
>>> I'm working on a 3 node cluster project.
>>> I didnt want to go to 2 node cluster , I'd rather3 nodes for the
>>> ressources, and to have a quorum. My 3 machines are identical.
>>>
>>> Each machine exports a disk to the cluster via iscsi (it's to simulate a
>>> SAN on my test platform).
>>> For the moment all is running : dlm + clvm (+ cmirror) + gfs2
>>>
>>> I now want to add the fencing to my cluster (for the moment it's
>>> disabled in the cluster config). My problem is to choose the right
>>> fencing/stonith.
>>>
>>> My questions :
>>>
>>> - As I have a quorum , do I need fencing ?
>>
>> Yes.
>>
>> Quorum is a tool useful when the cluster nodes are behaving predictably.
>> Fencing is a tool for when a node has entered an unknown state. They
>> solve different problems.
>>
> OK Thank you for the explanation between the 2 concepts. 
> 
>>> - I can't use an ipmi/drac or so fencing agent, my cluster is on a very
>>> low-cost test platform it hasn't remote reboot/poweroff access.
>>
>> Then you will need to use an external device, like a switched PDU.
>>
> Actually the problem is that this test (and very low-cost) platform is hosted elsewhere.
> I just found that there is a REST API to maybe reboot the nodes. But for that Ill have to write a fence_agent.
> 
> I'm not familiar with python or perl, I'm just experienced in bash programming.
> Is it possible to launch a simple shell script (making a curl) as fence agent ? 

Yes, what matters mainly is that you can read in STDIN values passed as
'variable=value', one per line. Then you need to make sure the agent
exits with the appropriate return code. Last, you need to export the
agent's meta-data when asked (call 'fence_ipmilan -o metadata' for an
example).

The API is here: https://fedorahosted.org/cluster/wiki/FenceAgentAPI

>>> - Maybe I could use the fence_scsi, but how to use it on 3 nodes ?
>>> Where should I place the fencing disk ? If the node exporting the disk
>>> fails, there is no more fencing.
>>
>> This would disconnect the failed VM from the shared storage, but leave
>> it otherwise alone. Personally, I don't like this because though the
>> storage is safe, it's possible for the node to still cause trouble. I
>> much prefer, and always recommend, power fencing.
>>
> If this demo cluster works I ll be able to have owned machines with idrac/ipmi so it will be easier.
> Setting up the demo cluster is harder that the production one :) 

OK. Please also push for a pair of PDUs for your production cluster as
well. IPMI (RMC, DRAC, iLO, etc) die with their host in certain cases,
leaving the lost node unfencable. When IPMI fencing works, it is ideal
because a confirmed fence is 100% confirmed, where PDU fencing is "the
outlets are off, hope you had the plugged into the right place". So I
use PDU fencing as a *backup* to IPMI fencing, for those cases where
IPMI fails (dead host, locked up BMC, failed NIC/cable, etc).

You'll want dual PDUs so that the PDU itself (or the UPS behind it)
isn't a SPoF.

>>> - Any other advice will be helpful :)
>>
>> APC-brand AP7900 switched PDUs are excellent and fast fence devices.
>> They can often be found used for under $200usd. If this is still too
>> much, and if this is a learning platform, something I did when I first
>> learned was to create an arduino-based fence device
>> (http://nodeassassin.org).
>>
>>> Regards,
>>>
>>> Nicolas.


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list