[ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
renayama19661014 at yahoo.co.jp
renayama19661014 at yahoo.co.jp
Thu Apr 15 17:45:08 EDT 2021
Hi ALl,
Sorry...
Due to my operation mistake, the same email was sent multiple times.
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc:
> Date: 2021/4/15, Thu 11:45
> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
>
> Hi Klaus,
> Hi Ken,
>
> We have confirmed that the operation is improved by the test.
> Thank you for your prompt response.
>
> We look forward to including this fix in the release version of RHEL 8.4.
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> ----- Original Message -----
>> From: "renayama19661014 at ybb.ne.jp"
> <renayama19661014 at ybb.ne.jp>
>> To: "kwenning at redhat.com" <kwenning at redhat.com>; Cluster
> Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>; Cluster Labs - All topics related to open-source
> clustering welcomed <users at clusterlabs.org>
>> Cc:
>> Date: 2021/4/13, Tue 07:08
>> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control
> fails.
>>
>> Hi Klaus,
>> Hi Ken,
>>
>>> I've opened https://github.com/ClusterLabs/pacemaker/pull/2342
> with
>>
>>> I guess the simplest possible solution to the immediate issue so
>>> that we can discuss it.
>>
>>
>> Thank you for the fix.
>>
>>
>> I have confirmed that the fixes have been merged.
>>
>> I'll test this fix today just in case.
>>
>> Many thanks,
>> Hideo Yamauchi.
>>
>>
>> ----- Original Message -----
>>> From: Klaus Wenninger <kwenning at redhat.com>
>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to
>> open-source clustering welcomed <users at clusterlabs.org>
>>> Cc:
>>> Date: 2021/4/12, Mon 22:22
>>> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource
> control
>> fails.
>>>
>>> On 4/9/21 5:13 PM, Klaus Wenninger wrote:
>>>> On 4/9/21 4:04 PM, Klaus Wenninger wrote:
>>>>> On 4/9/21 3:45 PM, Klaus Wenninger wrote:
>>>>>> On 4/9/21 3:36 PM, Klaus Wenninger wrote:
>>>>>>> On 4/9/21 2:37 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>>>>> Hi Klaus,
>>>>>>>>
>>>>>>>> Thanks for your comment.
>>>>>>>>
>>>>>>>>> Hmm ... is that with selinux enabled?
>>>>>>>>> Respectively do you see any related avc
> messages?
>>>>>>>>
>>>>>>>> Selinux is not enabled.
>>>>>>>> Isn't crm_mon caused by not returning a
> response
>> when
>>> pacemakerd
>>>>>>>> prepares to stop?
>>>>>> yep ... that doesn't look good.
>>>>>> While in pcmk_shutdown_worker ipc isn't handled.
>>>>> Stop ... that should actually work as pcmk_shutdown_worker
>>>>> should exit quite quickly and proceed after mainloop
>>>>> dispatching when called again.
>>>>> Don't see anything atm that might be blocking for longer
> ...
>>>>> but let me dig into it further ...
>>>> What happens is clear (thanks Ken for the hint ;-) ).
>>>> When pacemakerd is shutting down - already when it
>>>> shuts down the resources and not just when it starts to
>>>> reap the subdaemons - crm_mon reads that state and
>>>> doesn't try to connect to the cib anymore.
>>> I've opened https://github.com/ClusterLabs/pacemaker/pull/2342
> with
>>> I guess the simplest possible solution to the immediate issue so
>>> that we can discuss it.
>>>>>> Question is why that didn't create issue earlier.
>>>>>> Probably I didn't test with resources that had
> crm_mon in
>>>>>> their stop/monitor-actions but sbd should have run into
>>>>>> issues.
>>>>>>
>>>>>> Klaus
>>>>>>> But when shutting down a node the resources should be
>>>>>>> shutdown before pacemakerd goes down.
>>>>>>> But let me have a look if it can happen that
> pacemakerd
>>>>>>> doesn't react to the ipc-pings before. That btw.
> might
>> be
>>>>>>> lethal for sbd-scenarios (if the phase is too long
> and it
>>>>>>> migh actually not be defined).
>>>>>>>
>>>>>>> My idea with selinux would have been that it might
> block
>>>>>>> the ipc if crm_mon is issued by execd. But well
> forget
>>>>>>> about it as it is not enabled ;-)
>>>>>>>
>>>>>>>
>>>>>>> Klaus
>>>>>>>>
>>>>>>>> pgsql needs the result of crm_mon in demote
> processing
>> and
>>> stop
>>>>>>>> processing.
>>>>>>>> crm_mon should return a response even after
> pacemakerd
>> goes
>>> into a
>>>>>>>> stop operation.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Hideo Yamauchi.
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: Klaus Wenninger
> <kwenning at redhat.com>
>>>>>>>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs
> - All
>>
>>> topics related
>>>>>>>>> to open-source clustering welcomed
>>> <users at clusterlabs.org>
>>>>>>>>> Cc:
>>>>>>>>> Date: 2021/4/9, Fri 21:12
>>>>>>>>> Subject: Re: [ClusterLabs] [Problem] In
>> RHEL8.4beta,
>>> pgsql
>>>>>>>>> resource control fails.
>>>>>>>>>
>>>>>>>>> On 4/8/21 11:21 PM,
> renayama19661014 at ybb.ne.jp
>> wrote:
>>>>>>>>>> Hi Ken,
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> In the pgsql resource, crm_mon is
> executed
>> in the
>>> process of
>>>>>>>>>> demote and
>>>>>>>>> stop, and the result is processed.
>>>>>>>>>> However, pacemaker included in
> RHEL8.4beta
>> fails
>>> to execute
>>>>>>>>>> this crm_mon.
>>>>>>>>>> - The problem also occurs on github
>>>>>>>>>
> master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
>>>>>>>>>> The problem can be easily reproduced in
> the
>>> following ways.
>>>>>>>>>>
>>>>>>>>>> Step1. Modify to execute crm_mon in the
> stop
>>
>>> process of the
>>>>>>>>>> Dummy resource.
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> dummy_stop() {
>>>>>>>>>> mon=$(crm_mon -1)
>>>>>>>>>> ret=$?
>>>>>>>>>> ocf_log info "### YAMAUCHI
> ####
>>> crm_mon[${ret}] : ${mon}"
>>>>>>>>>> dummy_monitor
>>>>>>>>>> if [ $? = $OCF_SUCCESS ]; then
>>>>>>>>>> rm ${OCF_RESKEY_state}
>>>>>>>>>> fi
>>>>>>>>>> return $OCF_SUCCESS
>>>>>>>>>> }
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> Step2. Configure a cluster with two
> nodes.
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> [root at rh84-beta01 ~]# crm_mon -rfA1
>>>>>>>>>> Cluster Summary:
>>>>>>>>>> * Stack: corosync
>>>>>>>>>> * Current DC: rh84-beta01 (version
>>> 2.0.5-8.el8-ba59be7122)
>>>>>>>>>> - partition
>>>>>>>>> with quorum
>>>>>>>>>> * Last updated: Thu Apr 8 18:00:52
> 2021
>>>>>>>>>> * Last change: Thu Apr 8 18:00:38
> 2021
>> by
>>> root via
>>>>>>>>>> cibadmin on
>>>>>>>>> rh84-beta01
>>>>>>>>>> * 2 nodes configured
>>>>>>>>>> * 1 resource instance configured
>>>>>>>>>>
>>>>>>>>>> Node List:
>>>>>>>>>> * Online: [ rh84-beta01 rh84-beta02
> ]
>>>>>>>>>>
>>>>>>>>>> Full List of Resources:
>>>>>>>>>> * dummy-1
> (ocf::heartbeat:Dummy):
>> Started
>>> rh84-beta01
>>>>>>>>>>
>>>>>>>>>> Migration Summary:
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> Step3. Stop the node where the Dummy
>> resource is
>>> running. The
>>>>>>>>>> resource will
>>>>>>>>> fail over.
>>>>>>>>>> ----
>>>>>>>>>> [root at rh84-beta02 ~]# crm_mon -rfA1
>>>>>>>>>> Cluster Summary:
>>>>>>>>>> * Stack: corosync
>>>>>>>>>> * Current DC: rh84-beta02 (version
>>> 2.0.5-8.el8-ba59be7122)
>>>>>>>>>> - partition
>>>>>>>>> with quorum
>>>>>>>>>> * Last updated: Thu Apr 8 18:08:56
> 2021
>>>>>>>>>> * Last change: Thu Apr 8 18:05:08
> 2021
>> by
>>> root via
>>>>>>>>>> cibadmin on
>>>>>>>>> rh84-beta01
>>>>>>>>>> * 2 nodes configured
>>>>>>>>>> * 1 resource instance configured
>>>>>>>>>>
>>>>>>>>>> Node List:
>>>>>>>>>> * Online: [ rh84-beta02 ]
>>>>>>>>>> * OFFLINE: [ rh84-beta01 ]
>>>>>>>>>>
>>>>>>>>>> Full List of Resources:
>>>>>>>>>> * dummy-1
> (ocf::heartbeat:Dummy):
>> Started
>>> rh84-beta02
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> However, if you look at the log, you
> can see
>> that
>>> the
>>>>>>>>>> execution of crm_mon
>>>>>>>>> in the stop processing of the Dummy resource
> has
>>> failed.
>>>>>>>>>> ----
>>>>>>>>>> Apr 08 18:05:17 Dummy(dummy-1)[2631]:
>
>> INFO:
>>> ### YAMAUCHI ####
>>>>>>>>> crm_mon[102] : Pacemaker daemons shutting
> down ...
>>>>>>>>>> Apr 08 18:05:17 rh84-beta01
> pacemaker-execd
>>
>>> [2219]
>>>>>>>>>> (log_op_output)
>>>>>>>>> notice: dummy-1_stop_0[2631] error output [
>> crm_mon:
>>> Error:
>>>>>>>>> cluster is not
>>>>>>>>> available on this node ]
>>>>>>>>> Hmm ... is that with selinux enabled?
>>>>>>>>> Respectively do you see any related avc
> messages?
>>>>>>>>>
>>>>>>>>> Klaus
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> Similarly, pgsql also executes crm_mon
> with
>>> demote or stop, so
>>>>>>>>>> control
>>>>>>>>> fails.
>>>>>>>>>> The problem seems to be related to the
> next
>> fix.
>>>>>>>>>> * Report pacemakerd in state waiting
> for
>> sbd
>>>>>>>>>> -
>>> https://github.com/ClusterLabs/pacemaker/pull/2278
>>>>>>>>>>
>>>>>>>>>> The problem does not occur with the
> release
>>> version of
>>>>>>>>>> Pacemaker 2.0.5 or
>>>>>>>>> the Pacemaker included with RHEL8.3.
>>>>>>>>>> This issue has a huge impact on the
> user.
>>>>>>>>>>
>>>>>>>>>> Perhaps it also affects the control of
> other
>>
>>> resources that
>>>>>>>>>> utilize
>>>>>>>>> crm_mon.
>>>>>>>>>> Please improve the release version of
>> RHEL8.4 so
>>> that it
>>>>>>>>>> includes Pacemaker
>>>>>>>>> which does not cause this problem.
>>>>>>>>>> * Distributions other than RHEL may
> also
>> be
>>> affected in
>>>>>>>>>> future releases.
>>>>>>>>>>
>>>>>>>>>> ----
>>>>>>>>>> This content is the same as the
> following
>>> Bugzilla.
>>>>>>>>>> -
>>> https://bugs.clusterlabs.org/show_bug.cgi?id=5471
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Hideo Yamauchi.
>>>>>>>>>>
>>>>>>>>>>
>> _______________________________________________
>>>>
>>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
More information about the Users
mailing list