[ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
Klaus Wenninger
kwenning at redhat.com
Fri Apr 9 09:36:23 EDT 2021
On 4/9/21 2:37 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi Klaus,
>
> Thanks for your comment.
>
>> Hmm ... is that with selinux enabled?
>> Respectively do you see any related avc messages?
>
> Selinux is not enabled.
> Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop?
But when shutting down a node the resources should be
shutdown before pacemakerd goes down.
But let me have a look if it can happen that pacemakerd
doesn't react to the ipc-pings before. That btw. might be
lethal for sbd-scenarios (if the phase is too long and it
migh actually not be defined).
My idea with selinux would have been that it might block
the ipc if crm_mon is issued by execd. But well forget
about it as it is not enabled ;-)
Klaus
>
> pgsql needs the result of crm_mon in demote processing and stop processing.
> crm_mon should return a response even after pacemakerd goes into a stop operation.
>
> Best Regards,
> Hideo Yamauchi.
>
>
> ----- Original Message -----
>> From: Klaus Wenninger <kwenning at redhat.com>
>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
>> Cc:
>> Date: 2021/4/9, Fri 21:12
>> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
>>
>> On 4/8/21 11:21 PM, renayama19661014 at ybb.ne.jp wrote:
>>> Hi Ken,
>>> Hi All,
>>>
>>> In the pgsql resource, crm_mon is executed in the process of demote and
>> stop, and the result is processed.
>>> However, pacemaker included in RHEL8.4beta fails to execute this crm_mon.
>>> - The problem also occurs on github
>> master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
>>> The problem can be easily reproduced in the following ways.
>>>
>>> Step1. Modify to execute crm_mon in the stop process of the Dummy resource.
>>> ----
>>>
>>> dummy_stop() {
>>> mon=$(crm_mon -1)
>>> ret=$?
>>> ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
>>> dummy_monitor
>>> if [ $? = $OCF_SUCCESS ]; then
>>> rm ${OCF_RESKEY_state}
>>> fi
>>> return $OCF_SUCCESS
>>> }
>>> ----
>>>
>>> Step2. Configure a cluster with two nodes.
>>> ----
>>>
>>> [root at rh84-beta01 ~]# crm_mon -rfA1
>>> Cluster Summary:
>>> * Stack: corosync
>>> * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition
>> with quorum
>>> * Last updated: Thu Apr 8 18:00:52 2021
>>> * Last change: Thu Apr 8 18:00:38 2021 by root via cibadmin on
>> rh84-beta01
>>> * 2 nodes configured
>>> * 1 resource instance configured
>>>
>>> Node List:
>>> * Online: [ rh84-beta01 rh84-beta02 ]
>>>
>>> Full List of Resources:
>>> * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta01
>>>
>>> Migration Summary:
>>> ----
>>>
>>> Step3. Stop the node where the Dummy resource is running. The resource will
>> fail over.
>>> ----
>>> [root at rh84-beta02 ~]# crm_mon -rfA1
>>> Cluster Summary:
>>> * Stack: corosync
>>> * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition
>> with quorum
>>> * Last updated: Thu Apr 8 18:08:56 2021
>>> * Last change: Thu Apr 8 18:05:08 2021 by root via cibadmin on
>> rh84-beta01
>>> * 2 nodes configured
>>> * 1 resource instance configured
>>>
>>> Node List:
>>> * Online: [ rh84-beta02 ]
>>> * OFFLINE: [ rh84-beta01 ]
>>>
>>> Full List of Resources:
>>> * dummy-1 (ocf::heartbeat:Dummy): Started rh84-beta02
>>> ----
>>>
>>> However, if you look at the log, you can see that the execution of crm_mon
>> in the stop processing of the Dummy resource has failed.
>>> ----
>>> Apr 08 18:05:17 Dummy(dummy-1)[2631]: INFO: ### YAMAUCHI ####
>> crm_mon[102] : Pacemaker daemons shutting down ...
>>> Apr 08 18:05:17 rh84-beta01 pacemaker-execd [2219] (log_op_output)
>> notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not
>> available on this node ]
>> Hmm ... is that with selinux enabled?
>> Respectively do you see any related avc messages?
>>
>> Klaus
>>> ----
>>>
>>> Similarly, pgsql also executes crm_mon with demote or stop, so control
>> fails.
>>> The problem seems to be related to the next fix.
>>> * Report pacemakerd in state waiting for sbd
>>> - https://github.com/ClusterLabs/pacemaker/pull/2278
>>>
>>> The problem does not occur with the release version of Pacemaker 2.0.5 or
>> the Pacemaker included with RHEL8.3.
>>> This issue has a huge impact on the user.
>>>
>>> Perhaps it also affects the control of other resources that utilize
>> crm_mon.
>>> Please improve the release version of RHEL8.4 so that it includes Pacemaker
>> which does not cause this problem.
>>> * Distributions other than RHEL may also be affected in future releases.
>>>
>>> ----
>>> This content is the same as the following Bugzilla.
>>> - https://bugs.clusterlabs.org/show_bug.cgi?id=5471
>>> ----
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
--
Klaus Wenninger
Senior Software Engineer, EMEA ENG Base Operating Systems
Red Hat
kwenning at redhat.com
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
More information about the Users
mailing list