[Pacemaker] Seeking suggestions for cluster configuration of HA iSCSI target and initiators

Mon Jul 16 14:53:08 EDT 2012

Hello,

The last couple of months I've been busy on setting up highly available iSCSI targets and using the iSCSI luns in a KVM virtualisation cluster.

The iSCSI targets are setup using pacemaker, drbd, stgt and CentOS 6. Unfortunately I have not had the time to compile and test LIO (linux-iscsi.org) therefore I havn't been able to use SCSI-3 persistent reservation  fencing on the virtualisation hosts. http://linux.die.net/man/8/fence_scsi (lio does support this)

But the iSCSI target clusters with tgt will do.
I have got detailed note's on all the configuration steps to setup a cluster, they contain some information about our internel network and therefore are not suited for public use. (I'm planning to publish a public version of the document so time this year)

I can sent you my configuration document if you would like to see it. 

As for the virtualiation cluster. At first I setup a pacemaker cluster but it lacks support for a cluster filesystem to host the virtual machine config files. All VirtualDomains must be "definend" using virsh which means a local copy is stored at /etc/libvirt/qemu on each cluster node. (I just want a single location to store the configuration files)
Therefore a switch to a cman/rgmanager cluster for virtualization.
The VM's use iSCSI lun's for storage and I have setup a GFS2 filesystem to host the VM config files. One drawback of using rgmanager is that when a cluster node shuts down the VM's running on that node are shutdown and restared on a remaining cluster node.
I have written a bash script to live migrate the VM's to the remaining cluster nodes.

the notes on setting up the virtualisation cluster are not completely finished but again, if you would like to see it I can sent it to you.

Best regards,

Maurits van de Lande

| Van de Lande BV. | T +31 (0) 162 516000 | F +31 (0) 162 521417 | www.vdl-fittings.com |

________________________________________
Van: Phil Frost [phil at macprofessionals.com]
Verzonden: maandag 16 juli 2012 19:34
Aan: Digimer
CC: The Pacemaker cluster resource manager
Onderwerp: Re: [Pacemaker] Seeking suggestions for cluster configuration of HA iSCSI target and initiators

On 07/16/2012 01:14 PM, Digimer wrote:
> I've only tested this a little, so please take it as a general
> suggestion rather than strong advice.
>
> I created a two-node cluster, using red hat's high-availability add-on,
> using DRBD to keep the data replicated between the two "SAN" nodes and
> tgtd to export the LUNs. I had a virtual IP on the cluster to act as the
> target IP and I had DRBD in dual-primary mode with clustered LVM (so I
> had DRBD as the PV and exported the space from the LVs).
>
> Then I built a second cluster of five nodes to host KVM VMs. The
> underlying nodes used clustered LVM as well, but this time the LUNs was
> the PV. I carved this up into an LV per VM and made the VMs the HA
> service. Again using RH HA-Addon.
>
> In this setup, I was able to fail over the SAN without losing any VMs. I
> even messed up the fencing on the SAN cluster once, which meant it took
> 30s to fail over, and I didn't lose the VMs. So to the minimal extent I
> tested it, it worked excellently.
>
> I have some very rough notes on this setup. They're not fit for public
> consumption at all, but if you'd like I'll send them to you directly.
> They include the configurations which might help as a template or similar.

This sounds similar to what I have, except I'm doing it with only one
cluster. The reason I'm using one cluster is twofold:

1) the storage is replicated between only two nodes, and I wish to avoid
a two-node cluster so I can have a useful quorum.

2) my IO load is not high and my budget is low, so the storage nodes
could also run VMs and not be overloaded. Having this capability in the
event that too many VM nodes have failed is a robustness win.

As I have things configured, *usually* I can initiate a failover of the
target, and everything is fine. The problem is when I am unlucky the
initiator monitor action occurs while the target failover is occurring.
It's easy to get unlucky if something is horribly wrong, and the target
is down longer than a normal failover. It's also possible, though
harder, to get unlucky by simply issuing "crm resource mirgate
iscsitarget" at the right instant. My availability requirements aren't
so high that I couldn't deal with the occasional long-term target
failure being a special case, but simply performing a planned migration
of the target having the potential to uncleanly reboot all the VMs on
one node is pretty horrible.

I've been doing some study of the iscsi RA since my first post, and it
seems to me now that the "failure" in the monitor action isn't actually
in the monitor action at all. Rather, it appears that for *all* actions,
the RA does a "discovery" step, and that's what is failing. I'm not
really sure what this is, or why I need it. Is it simply to find an
unspecified portal for a given IQN? Is it therefore useless in my case,
since I've explicitly specified the portal in the resource parameters?

If I were to disable the "discovery" step, what are people's thoughts on
the case where the target is operational, but the initiator for some
reason (network failure) can't reach it? In this case, assume Pacemaker
knows the target is up; is there a way to encourage it to decide to
attempt migrating the initiator to another node?

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org