[ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

Tue Apr 4 16:09:34 EDT 2023

I know that uscsi initiators are very sensible to connection drops. That's 
why in all my setups with iscsi I use a special m/s resource agent which in 
a slave mode drops all packets to/from portals. That prevents initiators 
from receiving FIN packets from the target when it migrates, and they 
usually behave much better. I can share that RA and setup instructions if 
that is interesting to someone.

Reid Wahl <nwahl at redhat.com> 4 апреля 2023 г. 20:20:52 написал:

> On Tue, Apr 4, 2023 at 7:08 AM Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>> On Mon, 2023-04-03 at 02:47 +0300, Александр via Users wrote:
>> > Pacemaker + corosync cluster with 2 virtual machines (ubuntu 22.04,
>> > 16 Gb RAM, 8 CPU each) are assembled into a cluster, an HBA is
>> > forwarded to each of them to connect to a disk shelf according to the
>> > instructions https://netbergtw.com/top-support/articles/zfs-cib /. A
>>
>> That looks like a well-thought-out guide. One minor correction, since
>> Corosync 3, no-quorum-policy=ignore is no longer needed. Instead, set
>> "two_node: 1" in corosync.conf (which may be automatic depending on
>> what tools you're using).
>>
>> That's unlikely to be causing any issues, though.
>>
>> > ZFS pool was assembled from 4 disks in draid1, resources were
>> > configured - virtual IP, iSCSITarget, iSCSILun. LUN connected in
>> > VMware. During an abnormal shutdown of the node, resources move, but
>>
>> How are you testing abnormal shutdown? For something like a power
>> interruption. I'd expect that the node would be fenced, but in your
>> logs it looks like recovery is taking place between clean nodes.
>
> See also discussion starting at this comment:
> https://github.com/ClusterLabs/resource-agents/issues/1852#issuecomment-1479119045
>
> Happy to see this on the mailing list :)
>
>>
>> > at the moment this happens, VMware loses contact with the LUN, which
>> > should not happen. The journalctl log at the time of the move is
>> > here: https://pastebin.com/eLj8DdtY. I also tried to build a common
>> > storage on drbd with cloned VIP and Target resources, but this also
>> > does not work, besides, every time I move, there are always some
>> > problems with the start of resources. Any ideas what can be done
>> > about this? Loss of communication with the LUN even for a couple of
>> > seconds is already critical.
>> >
>> > corosync-qdevice/jammy,now 3.0.1-1 amd64 [installed]
>> > corosync-qnetd/jammy,now 3.0.1-1 amd64 [installed]
>> > corosync/jammy,now 3.1.6-1ubuntu1 amd64 [installed]
>> > pacemaker-cli-utils/jammy,now 2.1.2-1ubuntu3 amd64
>> > [installed,automatic]
>> > pacemaker-common/jammy,now 2.1.2-1ubuntu3 all [installed,automatic]
>> > pacemaker-resource-agents/jammy,now 2.1.2-1ubuntu3 all
>> > [installed,automatic]
>> > pacemaker/jammy,now 2.1.2-1ubuntu3 amd64 [installed]
>> > pcs/jammy,now 0.10.11-2ubuntu3 all [installed]
>> > _______________________________________________
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/
>> --
>> Ken Gaillot <kgaillot at redhat.com>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230404/a7c84d30/attachment-0001.htm>