[ClusterLabs] fence_virt architecture? (was: Re: Still Beginner STONITH Problem)

Mon Jul 20 08:49:57 EDT 2020

On 7/20/20 11:09 AM, Andrei Borzenkov wrote:
>
>
> On Mon, Jul 20, 2020 at 11:45 AM Klaus Wenninger <kwenning at redhat.com
> <mailto:kwenning at redhat.com>> wrote:
>
>     On 7/20/20 10:34 AM, Andrei Borzenkov wrote:
>>
>>
>>      
>>
>>         The cpg-configuration sounds interesting as well. Haven't used
>>         it or looked into the details. Would be interested to hear about
>>         how that works.
>>
>>
>>     It maintains a registry of VM location (each fence_virtd polls
>>     local hypervisor at regular intervals) and forwards fencing
>>     request to appropriate host via corosync interconnect. It is also
>>     the only backend that can handle host failure - if it is known
>>     that host left cluster, any VM on this host is considered fenced
>>     by definition.
>>
>>     It requires that hosts are configured in pacemaker cluster
>>     themselves (to handle host outage it must be properly fenced).
>     That sounds definitely interesting.
>     Are you saying that the hosts have to be pacemaker-nodes as well?
>
>
> cpg backend requires a working corosync cluster (it is using it as
> transport). It responds to "node joined" and "node left" events. So
> the question is when "node left" is generated. My understanding so far
> was that for unavailable node to be considered "left cluster" node
> must be fenced. If I am wrong, pacemaker is not needed. If fencing is
> required, I am not aware how it can be implemented without pacemaker.
Node joined / left from corosync perspective doesn't require any fencing.
cpg is just sitting on top of corosync and doesn't know about pacemaker,
fencing ...
and you can run multiple applications using multiple cpg-protocols on top of
a single corosync-instance. If this is useful of course depends on your
scenario,
the number of tenants involved, ...
Anyway I probably got it wrong. Thought it would as well use the
cpg-protocol
as transport between the fence-agents and the service instances. Which seems
not to be the case but would have caused that issue with multiple corosync
instances running on a single host.
>
> In any case, it is a completely separate cluster, so "as well" is not
> applicable.
>
>  
>
>     Otherwise we might be able to just add them to corosync and configure
>     them not to vote on quorum ...
>
>
> These clusters are on different levels. Consider multi-tenant
> deployment. Each VM cluster has separate owner, and is managed
> independently. There is no reason to integrate it into underlying host
> cluster. Host cluster is managed by provider, not by tenant.
>
>  
>
>     ... the same knet might then even be used to connect the bridges
>     on the hosts with each other on layer-2 ...
>
>
> With distributed backend you do not need any network connectivity
> between host and guest at all - just contact local hypervisor via
> local channel. That is actually more secure.
That was referring to a simple setup where you have a bridge on each
host where the VM interfaces  are enslaved to and you don't need any
connectivity of the nodes to the outside world via these interfaces but you
would like to see the VMs behind interfaces enslaved to bridges on
other hosts.
So you could use a single knet for the corosync traffic, tunneling traffic
between hosts and potentially for fencing requests. Everything the
infrastructure below sees is this one knet.
Probably not suitable for a scenario with a lot of tenants but easy
if you want to quickly spin up a test cluster on a couple of hosts
without the need of any administration outside these hosts.
>
>  
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200720/34d9ab37/attachment.htm>