[ClusterLabs] Fencing libvirt

Mon Jun 18 11:12:11 EDT 2018

On Mon, Jun 18, 2018 at 10:58 AM Ken Gaillot <kgaillot at redhat.com> wrote:
>
> On Mon, 2018-06-18 at 10:10 -0400, Jason Gauthier wrote:
> > On Mon, Jun 18, 2018 at 9:55 AM Ken Gaillot <kgaillot at redhat.com>
> > wrote:
> > >
> > > On Fri, 2018-06-15 at 21:39 -0400, Jason Gauthier wrote:
> > > > Greetings,
> > > >
> > > >    Previously, I was using fiber channel with block devices.  I
> > > > used
> > > > sbd to fence the disks, by creating a small block device, and
> > > > then
> > > > using stonith to fence the physical disk block.
> > > >
> > > > However, I had some reliability issues with that (I believe it
> > > > was
> > > > the
> > > > fibre channel interfacing, not clustering).  So, I've moved
> > > > everything
> > > > to NFS.
> > >
> > > Another possibility would be to use qdevice with a third host, so
> > > that
> > > the cluster has true quorum, and then you can use sbd with hardware
> > > watchdog only (no need for a shared disk).
> >
> > This is the first time I've heard of qdevice.  So, that said, the man
> > page indicates it's used on 2 node clusters, or even numbered?
> >
> > "It is recommended for clusters with an even number of nodes and
> > highly recommended for 2 node clusters."
> >
> > But that said, I have two nodes and a shared storage.  I could make
> > the shared storage device a node, which seems a little odd.   But
> > that
> > would make a 3rd. I do have another system, but the electricity costs
> > are already high for this project :)
>
> A qdevice host isn't a cluster node -- qdevice is a lightweight process
> that simply provides a quorum vote.
>
> I don't know what your shared storage device is, but if it's capable of
> running qdevice, I don't know of any problems with that (if you lose
> the storage, you can't do anything anyway, so it wouldn't be a big deal
> to lose quorum). It does sound like the shared storage device is a
> single point of failure.

Yes, it is a single point of failure, for sure. I'm using NFS as the
shared storage platform.
I'll check out qdevice for all 3.

> > > > My only resources are virtual machines running with KVM.    So, I
> > > > am
> > >
> > > The VMs are resources, or nodes? The libvirt fence agents are for
> > > fencing VMs used as full cluster nodes.
> > >
> >
> > The VMs are resources.   So, yeah, libvirt won't work. That
> > explanation helps.
> > It seems it would be simplest to protect each VM, so that if one node
> > went "missing" (but still running), the other node would not boot the
> > VMs and corrupt all the disks.
> >
> > That's what  thought libvirt did.  So, I still have to figure out
> > some
> > form of fencing for this.  (Which you suggested above, but I have not
> > completely processed yet)
>
> Power fencing is almost always best. If that's not possible, you could
> try getting sbd working again. If that's not possible, and your shared
> storage device supports SCSI-3 persistent reservations, you could look
> into fence_scsi, which cuts off a node's access to the shared disk.

The toss up was between iscsi and NFS, and I decided NFS was less hassle.
If I went iscsi, then I could create block devices and go back to an
sbd device as well.

I just had a ton of problem with block devices not showing up at boot
when using fibre channel.
Perhaps the iscsi layer is completely different and that's a non-issue.

> > > > trying to figure out what I should fence.  I saw stonith has a
> > > > module,
> > > > external/libvirt, and that seems like it might work. But I can't
> > > > seem
> > > > to figure out how to use it with my crm config.
> > > >
> > > > I've attempted this:
> > > > primitive st_libvirt stonith:external/libvirt \
> > > >         params hypervisor_uri="qemu:///system" hostlist="alpha
> > > > beta"
> > > > \
> > > >         op monitor interval=2h \
> > > >         meta target-role=Stoppedprimitive st_libvirt
> > > > stonith:external/libvirt \
> > > >         params hypervisor_uri="qemu:///system" hostlist="alpha
> > > > beta"
> > > > \
> > > >         op monitor interval=2h
> > > >
> > > > But, I am not sure this is the correct syntax.  The nodes are
> > > > alpha,
> > > > and beta.
> > > >
> > > > Any pointers appreciated.
> --
> Ken Gaillot <kgaillot at redhat.com>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org