[ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Fri Apr 23 18:00:32 EDT 2021
Hi Ken,
Hi Klaus,
Thanks for your comment.
>We did not have time to get it into the RHEL 8.4 GA (general
>availability) release, which means for example it will not be in 8.4
>install images, but we did get a 0-day fix, which means that it will be
>available via "yum update" the same day that 8.4 is released.
>
>Thanks for testing the 8.4 build and finding the issue!
Okay!
Best Regards,
Hideo Yamauchi.
----- Original Message -----
>From: Ken Gaillot <kgaillot at redhat.com>
>To: renayama19661014 at ybb.ne.jp
>Cc: kwenning <kwenning at redhat.com>
>Date: 2021/4/24, Sat 01:25
>Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
>
>Hi Hideo,
>
>A private reply to follow up:
>
>The fix will be in the 2.1.0 upstream release.
>
>We did not have time to get it into the RHEL 8.4 GA (general
>availability) release, which means for example it will not be in 8.4
>install images, but we did get a 0-day fix, which means that it will be
>available via "yum update" the same day that 8.4 is released.
>
>Thanks for testing the 8.4 build and finding the issue!
>
>On Thu, 2021-04-15 at 11:45 +0900, renayama19661014 at ybb.ne.jp wrote:
>> Hi Klaus,
>> Hi Ken,
>>
>> We have confirmed that the operation is improved by the test.
>> Thank you for your prompt response.
>>
>> We look forward to including this fix in the release version of RHEL
>> 8.4.
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>>
>> ----- Original Message -----
>> > From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
>> > To: "kwenning at redhat.com" <kwenning at redhat.com>; Cluster Labs - All
>> > topics related to open-source clustering welcomed <
>> > users at clusterlabs.org>; Cluster Labs - All topics related to open-
>> > source clustering welcomed <users at clusterlabs.org>
>> > Cc:
>> > Date: 2021/4/13, Tue 07:08
>> > Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource
>> > control fails.
>> >
>> > Hi Klaus,
>> > Hi Ken,
>> >
>> > > I've opened https://github.com/ClusterLabs/pacemaker/pull/2342
>> > > with
>> > > I guess the simplest possible solution to the immediate issue so
>> > > that we can discuss it.
>> >
>> >
>> > Thank you for the fix.
>> >
>> >
>> > I have confirmed that the fixes have been merged.
>> >
>> > I'll test this fix today just in case.
>> >
>> > Many thanks,
>> > Hideo Yamauchi.
>> >
>> >
>> > ----- Original Message -----
>> > > From: Klaus Wenninger <kwenning at redhat.com>
>> > > To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics
>> > > related to
>> >
>> > open-source clustering welcomed <users at clusterlabs.org>
>> > > Cc:
>> > > Date: 2021/4/12, Mon 22:22
>> > > Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql
>> > > resource control
>> >
>> > fails.
>> > >
>> > > On 4/9/21 5:13 PM, Klaus Wenninger wrote:
>> > > > On 4/9/21 4:04 PM, Klaus Wenninger wrote:
>> > > > > On 4/9/21 3:45 PM, Klaus Wenninger wrote:
>> > > > > > On 4/9/21 3:36 PM, Klaus Wenninger wrote:
>> > > > > > > On 4/9/21 2:37 PM, renayama19661014 at ybb.ne.jp wrote:
>> > > > > > > > Hi Klaus,
>> > > > > > > >
>> > > > > > > > Thanks for your comment.
>> > > > > > > >
>> > > > > > > > > Hmm ... is that with selinux enabled?
>> > > > > > > > > Respectively do you see any related avc messages?
>> > > > > > > >
>> > > > > > > > Selinux is not enabled.
>> > > > > > > > Isn't crm_mon caused by not returning a response
>> >
>> > when
>> > > pacemakerd
>> > > > > > > > prepares to stop?
>> > > > > >
>> > > > > > yep ... that doesn't look good.
>> > > > > > While in pcmk_shutdown_worker ipc isn't handled.
>> > > > >
>> > > > > Stop ... that should actually work as pcmk_shutdown_worker
>> > > > > should exit quite quickly and proceed after mainloop
>> > > > > dispatching when called again.
>> > > > > Don't see anything atm that might be blocking for longer
>> > > > > ...
>> > > > > but let me dig into it further ...
>> > > >
>> > > > What happens is clear (thanks Ken for the hint ;-) ).
>> > > > When pacemakerd is shutting down - already when it
>> > > > shuts down the resources and not just when it starts to
>> > > > reap the subdaemons - crm_mon reads that state and
>> > > > doesn't try to connect to the cib anymore.
>> > >
>> > > I've opened https://github.com/ClusterLabs/pacemaker/pull/2342
>> > > with
>> > > I guess the simplest possible solution to the immediate issue so
>> > > that we can discuss it.
>> > > > > > Question is why that didn't create issue earlier.
>> > > > > > Probably I didn't test with resources that had crm_mon in
>> > > > > > their stop/monitor-actions but sbd should have run into
>> > > > > > issues.
>> > > > > >
>> > > > > > Klaus
>> > > > > > > But when shutting down a node the resources should be
>> > > > > > > shutdown before pacemakerd goes down.
>> > > > > > > But let me have a look if it can happen that pacemakerd
>> > > > > > > doesn't react to the ipc-pings before. That btw. might
>> >
>> > be
>> > > > > > > lethal for sbd-scenarios (if the phase is too long and
>> > > > > > > it
>> > > > > > > migh actually not be defined).
>> > > > > > >
>> > > > > > > My idea with selinux would have been that it might
>> > > > > > > block
>> > > > > > > the ipc if crm_mon is issued by execd. But well forget
>> > > > > > > about it as it is not enabled ;-)
>> > > > > > >
>> > > > > > >
>> > > > > > > Klaus
>> > > > > > > >
>> > > > > > > > pgsql needs the result of crm_mon in demote
>> > > > > > > > processing
>> >
>> > and
>> > > stop
>> > > > > > > > processing.
>> > > > > > > > crm_mon should return a response even after
>> > > > > > > > pacemakerd
>> >
>> > goes
>> > > into a
>> > > > > > > > stop operation.
>> > > > > > > >
>> > > > > > > > Best Regards,
>> > > > > > > > Hideo Yamauchi.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > ----- Original Message -----
>> > > > > > > > > From: Klaus Wenninger <kwenning at redhat.com>
>> > > > > > > > > To: renayama19661014 at ybb.ne.jp; Cluster Labs - All
>> > > topics related
>> > > > > > > > > to open-source clustering welcomed
>> > >
>> > > <users at clusterlabs.org>
>> > > > > > > > > Cc:
>> > > > > > > > > Date: 2021/4/9, Fri 21:12
>> > > > > > > > > Subject: Re: [ClusterLabs] [Problem] In
>> >
>> > RHEL8.4beta,
>> > > pgsql
>> > > > > > > > > resource control fails.
>> > > > > > > > >
>> > > > > > > > > On 4/8/21 11:21 PM, renayama19661014 at ybb.ne.jp
>> >
>> > wrote:
>> > > > > > > > > > Hi Ken,
>> > > > > > > > > > Hi All,
>> > > > > > > > > >
>> > > > > > > > > > In the pgsql resource, crm_mon is executed
>> >
>> > in the
>> > > process of
>> > > > > > > > > > demote and
>> > > > > > > > >
>> > > > > > > > > stop, and the result is processed.
>> > > > > > > > > > However, pacemaker included in RHEL8.4beta
>> >
>> > fails
>> > > to execute
>> > > > > > > > > > this crm_mon.
>> > > > > > > > > > - The problem also occurs on github
>> > > > > > > > >
>> > > > > > > > > master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
>> > > > > > > > > > The problem can be easily reproduced in the
>> > >
>> > > following ways.
>> > > > > > > > > >
>> > > > > > > > > > Step1. Modify to execute crm_mon in the stop
>> > > process of the
>> > > > > > > > > > Dummy resource.
>> > > > > > > > > > ----
>> > > > > > > > > >
>> > > > > > > > > > dummy_stop() {
>> > > > > > > > > > mon=$(crm_mon -1)
>> > > > > > > > > > ret=$?
>> > > > > > > > > > ocf_log info "### YAMAUCHI ####
>> > >
>> > > crm_mon[${ret}] : ${mon}"
>> > > > > > > > > > dummy_monitor
>> > > > > > > > > > if [ $? = $OCF_SUCCESS ]; then
>> > > > > > > > > > rm ${OCF_RESKEY_state}
>> > > > > > > > > > fi
>> > > > > > > > > > return $OCF_SUCCESS
>> > > > > > > > > > }
>> > > > > > > > > > ----
>> > > > > > > > > >
>> > > > > > > > > > Step2. Configure a cluster with two nodes.
>> > > > > > > > > > ----
>> > > > > > > > > >
>> > > > > > > > > > [root at rh84-beta01 ~]# crm_mon -rfA1
>> > > > > > > > > > Cluster Summary:
>> > > > > > > > > > * Stack: corosync
>> > > > > > > > > > * Current DC: rh84-beta01 (version
>> > >
>> > > 2.0.5-8.el8-ba59be7122)
>> > > > > > > > > > - partition
>> > > > > > > > >
>> > > > > > > > > with quorum
>> > > > > > > > > > * Last updated: Thu Apr 8 18:00:52 2021
>> > > > > > > > > > * Last change: Thu Apr 8 18:00:38 2021
>> >
>> > by
>> > > root via
>> > > > > > > > > > cibadmin on
>> > > > > > > > >
>> > > > > > > > > rh84-beta01
>> > > > > > > > > > * 2 nodes configured
>> > > > > > > > > > * 1 resource instance configured
>> > > > > > > > > >
>> > > > > > > > > > Node List:
>> > > > > > > > > > * Online: [ rh84-beta01 rh84-beta02 ]
>> > > > > > > > > >
>> > > > > > > > > > Full List of Resources:
>> > > > > > > > > > * dummy-1 (ocf::heartbeat:Dummy):
>> >
>> > Started
>> > > rh84-beta01
>> > > > > > > > > >
>> > > > > > > > > > Migration Summary:
>> > > > > > > > > > ----
>> > > > > > > > > >
>> > > > > > > > > > Step3. Stop the node where the Dummy
>> >
>> > resource is
>> > > running. The
>> > > > > > > > > > resource will
>> > > > > > > > >
>> > > > > > > > > fail over.
>> > > > > > > > > > ----
>> > > > > > > > > > [root at rh84-beta02 ~]# crm_mon -rfA1
>> > > > > > > > > > Cluster Summary:
>> > > > > > > > > > * Stack: corosync
>> > > > > > > > > > * Current DC: rh84-beta02 (version
>> > >
>> > > 2.0.5-8.el8-ba59be7122)
>> > > > > > > > > > - partition
>> > > > > > > > >
>> > > > > > > > > with quorum
>> > > > > > > > > > * Last updated: Thu Apr 8 18:08:56 2021
>> > > > > > > > > > * Last change: Thu Apr 8 18:05:08 2021
>> >
>> > by
>> > > root via
>> > > > > > > > > > cibadmin on
>> > > > > > > > >
>> > > > > > > > > rh84-beta01
>> > > > > > > > > > * 2 nodes configured
>> > > > > > > > > > * 1 resource instance configured
>> > > > > > > > > >
>> > > > > > > > > > Node List:
>> > > > > > > > > > * Online: [ rh84-beta02 ]
>> > > > > > > > > > * OFFLINE: [ rh84-beta01 ]
>> > > > > > > > > >
>> > > > > > > > > > Full List of Resources:
>> > > > > > > > > > * dummy-1 (ocf::heartbeat:Dummy):
>> >
>> > Started
>> > > rh84-beta02
>> > > > > > > > > > ----
>> > > > > > > > > >
>> > > > > > > > > > However, if you look at the log, you can see
>> >
>> > that
>> > > the
>> > > > > > > > > > execution of crm_mon
>> > > > > > > > >
>> > > > > > > > > in the stop processing of the Dummy resource has
>> > >
>> > > failed.
>> > > > > > > > > > ----
>> > > > > > > > > > Apr 08 18:05:17 Dummy(dummy-1)[2631]:
>> >
>> > INFO:
>> > > ### YAMAUCHI ####
>> > > > > > > > > crm_mon[102] : Pacemaker daemons shutting down ...
>> > > > > > > > > > Apr 08 18:05:17 rh84-beta01 pacemaker-execd
>> >
>> >
>> > > [2219]
>> > > > > > > > > > (log_op_output)
>> > > > > > > > >
>> > > > > > > > > notice: dummy-1_stop_0[2631] error output [
>> >
>> > crm_mon:
>> > > Error:
>> > > > > > > > > cluster is not
>> > > > > > > > > available on this node ]
>> > > > > > > > > Hmm ... is that with selinux enabled?
>> > > > > > > > > Respectively do you see any related avc messages?
>> > > > > > > > >
>> > > > > > > > > Klaus
>> > > > > > > > > > ----
>> > > > > > > > > >
>> > > > > > > > > > Similarly, pgsql also executes crm_mon with
>> > >
>> > > demote or stop, so
>> > > > > > > > > > control
>> > > > > > > > >
>> > > > > > > > > fails.
>> > > > > > > > > > The problem seems to be related to the next
>> >
>> > fix.
>> > > > > > > > > > * Report pacemakerd in state waiting for
>> >
>> > sbd
>> > > > > > > > > > -
>> > >
>> > > https://github.com/ClusterLabs/pacemaker/pull/2278
>> > > > > > > > > >
>> > > > > > > > > > The problem does not occur with the release
>> > >
>> > > version of
>> > > > > > > > > > Pacemaker 2.0.5 or
>> > > > > > > > >
>> > > > > > > > > the Pacemaker included with RHEL8.3.
>> > > > > > > > > > This issue has a huge impact on the user.
>> > > > > > > > > >
>> > > > > > > > > > Perhaps it also affects the control of other
>> > > resources that
>> > > > > > > > > > utilize
>> > > > > > > > >
>> > > > > > > > > crm_mon.
>> > > > > > > > > > Please improve the release version of
>> >
>> > RHEL8.4 so
>> > > that it
>> > > > > > > > > > includes Pacemaker
>> > > > > > > > >
>> > > > > > > > > which does not cause this problem.
>> > > > > > > > > > * Distributions other than RHEL may also
>> >
>> > be
>> > > affected in
>> > > > > > > > > > future releases.
>> > > > > > > > > >
>> > > > > > > > > > ----
>> > > > > > > > > > This content is the same as the following
>> > >
>> > > Bugzilla.
>> > > > > > > > > > -
>> > >
>> > > https://bugs.clusterlabs.org/show_bug.cgi?id=5471
>> > > > > > > > > > ----
>> > > > > > > > > >
>> > > > > > > > > > Best Regards,
>> > > > > > > > > > Hideo Yamauchi.
>> > > > > > > > > >
>> > > > > > > > > >
>> >
>> > _______________________________________________
>> > > >
>> >
>> > _______________________________________________
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/
>> >
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>--
>Ken Gaillot <kgaillot at redhat.com>
>
>
>
>
More information about the Users
mailing list