[ClusterLabs] Preventing multiple resources from moving at the same time.

Fri Apr 30 10:04:34 EDT 2021

On 4/21/21 11:04 AM, Matthew Schumacher wrote:
> On 4/21/21 10:21 AM, Andrei Borzenkov wrote:
>>> If I set the stickiness to 100 then it's a race condition, many 
>>> times we
>>> get the storage layer migrated without VirtualDomain noticing, but if
>>> the stickiness is not set, then moving a resource causes the cluster to
>>> re-balance and will cause the VM to fail every time because validation
>>> is one of the first things we do when we migrate the VM, and it's at 
>>> the
>>> same time as a IP-ZFS-iSCSI move so the config file goes away for 5
>>> seconds.
>>>
>>> I'm not sure how to fix this.  The nodes don't have local storage that
>> Your nodes must have operating system and pacemaker stack loaded from
>> somewhere before they can import zfs pool.
>
> Yup, and they do.  There are plenty of ways to do this: internal SD 
> card, usb boot, pxe boot, etc....  I prefer this because I don't need 
> to maintain a boot drive, the nodes boot from the exact same image, 
> and I have gobs of memory so the running system can run in a ramdisk.  
> This also makes it possible to boot my nodes with failed 
> disks/controllers which makes troubleshooting easier.   I basically 
> made a live CD distro that has everything I need.
>
>>> I suppose the next step is to see if NFS has some sort of retry mode so
>> That is what "hard" mount option is for.
>>
> Thanks, I'll take a look.

For others searching the list, I did figure this out.  The problem was 
the order I was loading the resources in.

This doesn't work because we start the failover IP before ZFS which 
starts the NFS share.  This causes there to be a split second where the 
IP is listening for NFS requests, but the NFS server isn't running yet, 
so the IP stack sends a RST which causes the NFS client to report to the 
OS a hard failure which causes the VirtualDomain resource to see an 
invalid config, and thus breaks things.

   * Resource Group: IP-ZFS-iSCSI:
     * fence-datastore    (stonith:fence_scsi):     Started node1
     * failover-ip    (ocf::heartbeat:IPaddr):     Started node1
     * zfs-datastore    (ocf::heartbeat:ZFS):     Started node1
     * ZFSiSCSI    (ocf::heartbeat:ZFSiSCSI):     Started node1

If I change it to this, then NFS requests simply go unanswered and the 
client retries until it can make a connection, which is responded to.

   * Resource Group: IP-ZFS-iSCSI:
     * fence-datastore    (stonith:fence_scsi):     Started node1
     * zfs-datastore    (ocf::heartbeat:ZFS):     Started node1
     * ZFSiSCSI    (ocf::heartbeat:ZFSiSCSI):     Started node1
     * failover-ip    (ocf::heartbeat:IPaddr):     Started node1

Originally I didn't do it this way because my iscsi and nfs stack bind 
to the failover IP and I was worried stuff wouldn't start until the IP 
was configured, but that doesn't seam to be a problem.

Matt