[ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Fri Apr 9 10:04:33 EDT 2021

On 4/9/21 3:45 PM, Klaus Wenninger wrote:
> On 4/9/21 3:36 PM, Klaus Wenninger wrote:
>> On 4/9/21 2:37 PM, renayama19661014 at ybb.ne.jp wrote:
>>> Hi Klaus,
>>>
>>> Thanks for your comment.
>>>
>>>> Hmm ... is that with selinux enabled?
>>>> Respectively do you see any related avc messages?
>>>
>>> Selinux is not enabled.
>>> Isn't crm_mon caused by not returning a response when pacemakerd 
>>> prepares to stop?
> yep ... that doesn't look good.
> While in pcmk_shutdown_worker ipc isn't handled.
Stop ... that should actually work as pcmk_shutdown_worker
should exit quite quickly and proceed after mainloop
dispatching when called again.
Don't see anything atm that might be blocking for longer ...
but let me dig into it further ...
> Question is why that didn't create issue earlier.
> Probably I didn't test with resources that had crm_mon in
> their stop/monitor-actions but sbd should have run into
> issues.
>
> Klaus
>> But when shutting down a node the resources should be
>> shutdown before pacemakerd goes down.
>> But let me have a look if it can happen that pacemakerd
>> doesn't react to the ipc-pings before. That btw. might be
>> lethal for sbd-scenarios (if the phase is too long and it
>> migh actually not be defined).
>>
>> My idea with selinux would have been that it might block
>> the ipc if crm_mon is issued by execd. But well forget
>> about it as it is not enabled ;-)
>>
>>
>> Klaus
>>>
>>> pgsql needs the result of crm_mon in demote processing and stop 
>>> processing.
>>> crm_mon should return a response even after pacemakerd goes into a 
>>> stop operation.
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>> ----- Original Message -----
>>>> From: Klaus Wenninger <kwenning at redhat.com>
>>>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related 
>>>> to open-source clustering welcomed <users at clusterlabs.org>
>>>> Cc:
>>>> Date: 2021/4/9, Fri 21:12
>>>> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource 
>>>> control fails.
>>>>
>>>> On 4/8/21 11:21 PM, renayama19661014 at ybb.ne.jp wrote:
>>>>>   Hi Ken,
>>>>>   Hi All,
>>>>>
>>>>>   In the pgsql resource, crm_mon is executed in the process of 
>>>>> demote and
>>>> stop, and the result is processed.
>>>>>   However, pacemaker included in RHEL8.4beta fails to execute this 
>>>>> crm_mon.
>>>>>     - The problem also occurs on github
>>>> master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
>>>>>   The problem can be easily reproduced in the following ways.
>>>>>
>>>>>   Step1. Modify to execute crm_mon in the stop process of the 
>>>>> Dummy resource.
>>>>>   ----
>>>>>
>>>>>   dummy_stop() {
>>>>>        mon=$(crm_mon -1)
>>>>>        ret=$?
>>>>>        ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
>>>>>        dummy_monitor
>>>>>        if [ $? =  $OCF_SUCCESS ]; then
>>>>>            rm ${OCF_RESKEY_state}
>>>>>        fi
>>>>>        return $OCF_SUCCESS
>>>>>   }
>>>>>   ----
>>>>>
>>>>>   Step2. Configure a cluster with two nodes.
>>>>>   ----
>>>>>
>>>>>   [root at rh84-beta01 ~]# crm_mon -rfA1
>>>>>   Cluster Summary:
>>>>>      * Stack: corosync
>>>>>      * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - 
>>>>> partition
>>>> with quorum
>>>>>      * Last updated: Thu Apr  8 18:00:52 2021
>>>>>      * Last change:  Thu Apr  8 18:00:38 2021 by root via cibadmin on
>>>> rh84-beta01
>>>>>      * 2 nodes configured
>>>>>      * 1 resource instance configured
>>>>>
>>>>>   Node List:
>>>>>      * Online: [ rh84-beta01 rh84-beta02 ]
>>>>>
>>>>>   Full List of Resources:
>>>>>      * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta01
>>>>>
>>>>>   Migration Summary:
>>>>>   ----
>>>>>
>>>>>   Step3. Stop the node where the Dummy resource is running. The 
>>>>> resource will
>>>> fail over.
>>>>>   ----
>>>>>   [root at rh84-beta02 ~]# crm_mon -rfA1
>>>>>   Cluster Summary:
>>>>>      * Stack: corosync
>>>>>      * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - 
>>>>> partition
>>>> with quorum
>>>>>      * Last updated: Thu Apr  8 18:08:56 2021
>>>>>      * Last change:  Thu Apr  8 18:05:08 2021 by root via cibadmin on
>>>> rh84-beta01
>>>>>      * 2 nodes configured
>>>>>      * 1 resource instance configured
>>>>>
>>>>>   Node List:
>>>>>      * Online: [ rh84-beta02 ]
>>>>>      * OFFLINE: [ rh84-beta01 ]
>>>>>
>>>>>   Full List of Resources:
>>>>>      * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta02
>>>>>   ----
>>>>>
>>>>>   However, if you look at the log, you can see that the execution 
>>>>> of crm_mon
>>>> in the stop processing of the Dummy resource has failed.
>>>>>   ----
>>>>>   Apr 08 18:05:17  Dummy(dummy-1)[2631]:    INFO: ### YAMAUCHI ####
>>>> crm_mon[102] : Pacemaker daemons shutting down ...
>>>>>   Apr 08 18:05:17 rh84-beta01 pacemaker-execd     [2219] 
>>>>> (log_op_output)
>>>> notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster 
>>>> is not
>>>> available on this node ]
>>>> Hmm ... is that with selinux enabled?
>>>> Respectively do you see any related avc messages?
>>>>
>>>> Klaus
>>>>>   ----
>>>>>
>>>>>   Similarly, pgsql also executes crm_mon with demote or stop, so 
>>>>> control
>>>> fails.
>>>>>   The problem seems to be related to the next fix.
>>>>>     * Report pacemakerd in state waiting for sbd
>>>>>      - https://github.com/ClusterLabs/pacemaker/pull/2278
>>>>>
>>>>>   The problem does not occur with the release version of Pacemaker 
>>>>> 2.0.5 or
>>>> the Pacemaker included with RHEL8.3.
>>>>>   This issue has a huge impact on the user.
>>>>>
>>>>>   Perhaps it also affects the control of other resources that utilize
>>>> crm_mon.
>>>>>   Please improve the release version of RHEL8.4 so that it 
>>>>> includes Pacemaker
>>>> which does not cause this problem.
>>>>>     * Distributions other than RHEL may also be affected in future 
>>>>> releases.
>>>>>
>>>>>   ----
>>>>>   This content is the same as the following Bugzilla.
>>>>>     - https://bugs.clusterlabs.org/show_bug.cgi?id=5471
>>>>>   ----
>>>>>
>>>>>   Best Regards,
>>>>>   Hideo Yamauchi.
>>>>>
>>>>>   _______________________________________________
>>>>>   Manage your subscription:
>>>>>   https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>>   ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/