[ClusterLabs] PostgreSQL HA on EL9

Thu Sep 14 11:43:38 EDT 2023

I found my issue with reboots - and it wasn't pacemaker-related at all.  My EL9 test system was different from the EL7 system in that it hosted the DB on a iSCSI-attached array.  During OS shutdown, the array was being unmounted concurrently with pacemaker shutdown, so it was not able to cleanly shut down the pgsql resource.     I added a systemd override to make corosync dependent upon, and require, "remote-fs.target".   Everything shuts down cleanly now, as expected.

Thanks for the suggestions,

Larry

> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Oyvind Albrigtsen
> Sent: Thursday, September 14, 2023 5:43 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] PostgreSQL HA on EL9
> 
> If you're using network filesystems with the Filesystem agent this
> patch might solve your issue:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_ClusterLabs_resource-
> 2Dagents_pull_1869&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=vEhk79BWO
> NaF8zrTI3oGbq7xqEYdQUICm-2H3Wal0J8&e=
> 
> 
> Oyvind
> 
> On 13/09/23 17:56 +0000, Larry G. Mills via Users wrote:
> >>
> >> On my RHEL 9 test cluster, both "reboot" and "systemctl reboot" wait
> >> for the cluster to stop everything.
> >>
> >> I think in some environments "reboot" is equivalent to "systemctl
> >> reboot --force" (kill all processes immediately), so maybe see if
> >> "systemctl reboot" is better.
> >>
> >> >
> >> > On EL7, this scenario caused the cluster to shut itself down on the
> >> > node before the OS shutdown completed, and the DB resource was
> >> > stopped/shutdown before the OS stopped.  On EL9, this is not the
> >> > case, the DB resource is not stopped before the OS shutdown
> >> > completes.  This leads to errors being thrown when the cluster is
> >> > started back up on the rebooted node similar to the following:
> >> >
> >
> >Ken,
> >
> >Thanks for the reply - and that's interesting that RHEL9 behaves as expected
> and AL9 seemingly doesn't.   I did try shutting down via "systemctl reboot",
> but the cluster and resources were still not stopped cleanly before the OS
> stopped.  In fact, the commands "shutdown" and "reboot" are just symlinks
> to systemctl on AL9.2, so that make sense why the behavior is the same.
> >
> >Just as a point of reference, my systemd version is: systemd.x86_64
> 252-14.el9_2.3
> >
> >Larry
> >_______________________________________________
> >Manage your subscription:
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.clusterlabs.org_mailman_listinfo_users&d=DwICAg&c=gRgGjJ3BkIsb
> 5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=2Rx_74MVv
> kAWfZLyMhZw5GCY_37uyRffB2HV4_zkvOY&e=
> >
> >ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.clusterlabs.org_&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=lofFF14IrTG
> 21epUbKbV0oUl-IrXZDSuNcaM1GM7FvU&e=
> >
> 
> _______________________________________________
> Manage your subscription:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.clusterlabs.org_mailman_listinfo_users&d=DwICAg&c=gRgGjJ3BkIsb
> 5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=2Rx_74MVv
> kAWfZLyMhZw5GCY_37uyRffB2HV4_zkvOY&e=
> 
> ClusterLabs home: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.clusterlabs.org_&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=-
> 46XreMySVoZzxM8t8YcpIX4ayXVWYLvAe0EnGHidNE&m=VO4147YbENDjp3d
> xoJeWclZ_EfLrehCht5CgW4_stkgPmryQN0kBA6G12wBwYztD&s=lofFF14IrTG
> 21epUbKbV0oUl-IrXZDSuNcaM1GM7FvU&e=