[ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

Reid Wahl nwahl at redhat.com
Tue Apr 4 14:20:32 EDT 2023


On Tue, Apr 4, 2023 at 7:08 AM Ken Gaillot <kgaillot at redhat.com> wrote:
>
> On Mon, 2023-04-03 at 02:47 +0300, Александр via Users wrote:
> > Pacemaker + corosync cluster with 2 virtual machines (ubuntu 22.04,
> > 16 Gb RAM, 8 CPU each) are assembled into a cluster, an HBA is
> > forwarded to each of them to connect to a disk shelf according to the
> > instructions https://netbergtw.com/top-support/articles/zfs-cib /. A
>
> That looks like a well-thought-out guide. One minor correction, since
> Corosync 3, no-quorum-policy=ignore is no longer needed. Instead, set
> "two_node: 1" in corosync.conf (which may be automatic depending on
> what tools you're using).
>
> That's unlikely to be causing any issues, though.
>
> > ZFS pool was assembled from 4 disks in draid1, resources were
> > configured - virtual IP, iSCSITarget, iSCSILun. LUN connected in
> > VMware. During an abnormal shutdown of the node, resources move, but
>
> How are you testing abnormal shutdown? For something like a power
> interruption. I'd expect that the node would be fenced, but in your
> logs it looks like recovery is taking place between clean nodes.

See also discussion starting at this comment:
https://github.com/ClusterLabs/resource-agents/issues/1852#issuecomment-1479119045

Happy to see this on the mailing list :)

>
> > at the moment this happens, VMware loses contact with the LUN, which
> > should not happen. The journalctl log at the time of the move is
> > here: https://pastebin.com/eLj8DdtY. I also tried to build a common
> > storage on drbd with cloned VIP and Target resources, but this also
> > does not work, besides, every time I move, there are always some
> > problems with the start of resources. Any ideas what can be done
> > about this? Loss of communication with the LUN even for a couple of
> > seconds is already critical.
> >
> > corosync-qdevice/jammy,now 3.0.1-1 amd64 [installed]
> > corosync-qnetd/jammy,now 3.0.1-1 amd64 [installed]
> > corosync/jammy,now 3.1.6-1ubuntu1 amd64 [installed]
> > pacemaker-cli-utils/jammy,now 2.1.2-1ubuntu3 amd64
> > [installed,automatic]
> > pacemaker-common/jammy,now 2.1.2-1ubuntu3 all [installed,automatic]
> > pacemaker-resource-agents/jammy,now 2.1.2-1ubuntu3 all
> > [installed,automatic]
> > pacemaker/jammy,now 2.1.2-1ubuntu3 amd64 [installed]
> > pcs/jammy,now 0.10.11-2ubuntu3 all [installed]
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker



More information about the Users mailing list