[Pacemaker] clusters on virtualised platforms

Digimer lists at alteeve.ca
Thu Jul 17 01:54:29 EDT 2014


On 17/07/14 02:39 PM, Alex Samad - Yieldbroker wrote:
>> -----Original Message-----
>> From: Digimer [mailto:lists at alteeve.ca]
>> Sent: Thursday, 17 July 2014 3:00 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] clusters on virtualised platforms
>>
>> On 17/07/14 01:41 PM, Alex Samad - Yieldbroker wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Digimer [mailto:lists at alteeve.ca]
>>>> Sent: Thursday, 17 July 2014 2:02 PM
>>>> To: The Pacemaker cluster resource manager
>>>> Subject: Re: [Pacemaker] clusters on virtualised platforms
>>>>
>>>> Don't confuse quorum and fencing (stonith), they serve different
>> purposes.
>>>> Basically, quorum is useful when things are working, fencing is
>>>> required when things go wrong. So regardless of quorum disk, you
>>>> still need to be able to fence. This requires that each VM be able to
>>>> call the hypervisor and force a power off.
>>>
>>> TA, yep got that, I was thinking and writing ..
>>>
>>>>
>>>> Generally speaking, VM-based cluster nodes are good for learning, but
>>>> not production. It adds a layer that isn't needed and in HA, simple
>>>> should trump all else.
>>>
>>> Yeah well, it's not really going to change for us, we are virtualised and I can't
>> really see that changing. In fact I would presume you would see more of it.
>>>
>>> Thanks
>>
>> Then ensure that each VM is on a different host, otherwise the host itself
>> becomes a single point of failure. Further, you will want to add a backup
>> fence method that can take out the host if it stops responding.
>> Otherwise, a failure in the host would leave the target node's fence method
>> (the hypervisor) inaccessible, and a failed fence method can only be handled
>> safely by effectively hanging the cluster. It is not allowed that no response be
>> treated as confirmation of node death, lest you end up with inevitable split-
>> brains.
>
> Yes, but I think I will live with ESX not crashing, plus I have my hosts in a cluster, with auto restart of vm's.
>
> I think I am happy to presume the host will not fail, I think I have to extend that to VC as well. I do realise that the VC is much less reliable than esxi.  But I  have constraints I have to live with.

That's quite the assumption to make in an HA environment, but as you 
said, you need to choose the failure scenarios that you will accept 
taking the cluster out.

It is not the choice I would make, however.

> My major thrust with the question was more along the lines or
> How are people handing fencing with virtualisation. Is every one installing the VMWare SDK and creating users that can shutdown just those hosts or is there another acceptable (albeit not perfect) fencing method

For fencing to work, it must be able to power off the target regardless 
of the state the target may be in. That means that fencing must exit 
outside the target, and in VMs, that means via the hypervisor. In my 
case, I use KVM VMs so I use fence_xvm or fence_virsh. I don't use 
VMWare, so I can't speak to that.

> The other question was in the calculation of quorum can I add the accessibility to a dgw in the calculation

The only way to get quorum is either with a third node or with quorum 
disk. With a quorum disk, you can use heuristics. That said, I don't 
know what level of support there is in pacemaker for qdisk (maybe 
perfect, I really don't know).

>> I do see people try to do this more often, and I will continue to discourage
>> it... "An HA cluster is beautiful not when there is nothing left to add, but
>> when there is nothing left to take away.". Every piece in the cluster is
>> another potential point of failure.
>
> In a life many moons ago, I used to build MS Clusters and Oracle (linux) rac clusters all on phy boxes.
>
> But I think virt is here to stay, and for us its right, so I am trying to shoe horn in a cluster solution that works as well, without paying for vmware HA.  Some compromises we will have to live with. I think trusting in esx and vc are valid ones



-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Pacemaker mailing list