[ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Fri Apr 9 08:37:49 EDT 2021

Hi Klaus,

Thanks for your comment.

> Hmm ... is that with selinux enabled?

> Respectively do you see any related avc messages?

Selinux is not enabled.
Isn't crm_mon caused by not returning a response when pacemakerd prepares to stop?

pgsql needs the result of crm_mon in demote processing and stop processing.
crm_mon should return a response even after pacemakerd goes into a stop operation.

Best Regards,
Hideo Yamauchi.

----- Original Message -----
> From: Klaus Wenninger <kwenning at redhat.com>
> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
> Cc: 
> Date: 2021/4/9, Fri 21:12
> Subject: Re: [ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.
> 
> On 4/8/21 11:21 PM, renayama19661014 at ybb.ne.jp wrote:
>>  Hi Ken,
>>  Hi All,
>> 
>>  In the pgsql resource, crm_mon is executed in the process of demote and 
> stop, and the result is processed.
>> 
>>  However, pacemaker included in RHEL8.4beta fails to execute this crm_mon.
>>    - The problem also occurs on github 
> master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
>> 
>>  The problem can be easily reproduced in the following ways.
>> 
>>  Step1. Modify to execute crm_mon in the stop process of the Dummy resource.
>>  ----
>> 
>>  dummy_stop() {
>>       mon=$(crm_mon -1)
>>       ret=$?
>>       ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
>>       dummy_monitor
>>       if [ $? =  $OCF_SUCCESS ]; then
>>           rm ${OCF_RESKEY_state}
>>       fi
>>       return $OCF_SUCCESS
>>  }
>>  ----
>> 
>>  Step2. Configure a cluster with two nodes.
>>  ----
>> 
>>  [root at rh84-beta01 ~]# crm_mon -rfA1
>>  Cluster Summary:
>>     * Stack: corosync
>>     * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition 
> with quorum
>>     * Last updated: Thu Apr  8 18:00:52 2021
>>     * Last change:  Thu Apr  8 18:00:38 2021 by root via cibadmin on 
> rh84-beta01
>>     * 2 nodes configured
>>     * 1 resource instance configured
>> 
>>  Node List:
>>     * Online: [ rh84-beta01 rh84-beta02 ]
>> 
>>  Full List of Resources:
>>     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta01
>> 
>>  Migration Summary:
>>  ----
>> 
>>  Step3. Stop the node where the Dummy resource is running. The resource will 
> fail over.
>>  ----
>>  [root at rh84-beta02 ~]# crm_mon -rfA1
>>  Cluster Summary:
>>     * Stack: corosync
>>     * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition 
> with quorum
>>     * Last updated: Thu Apr  8 18:08:56 2021
>>     * Last change:  Thu Apr  8 18:05:08 2021 by root via cibadmin on 
> rh84-beta01
>>     * 2 nodes configured
>>     * 1 resource instance configured
>> 
>>  Node List:
>>     * Online: [ rh84-beta02 ]
>>     * OFFLINE: [ rh84-beta01 ]
>> 
>>  Full List of Resources:
>>     * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta02
>>  ----
>> 
>>  However, if you look at the log, you can see that the execution of crm_mon 
> in the stop processing of the Dummy resource has failed.
>> 
>>  ----
>>  Apr 08 18:05:17  Dummy(dummy-1)[2631]:    INFO: ### YAMAUCHI #### 
> crm_mon[102] : Pacemaker daemons shutting down ...
>>  Apr 08 18:05:17 rh84-beta01 pacemaker-execd     [2219] (log_op_output)  
> notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not 
> available on this node ]
> Hmm ... is that with selinux enabled?
> Respectively do you see any related avc messages?
> 
> Klaus
>>  ----
>> 
>>  Similarly, pgsql also executes crm_mon with demote or stop, so control 
> fails.
>> 
>>  The problem seems to be related to the next fix.
>>    * Report pacemakerd in state waiting for sbd
>>     - https://github.com/ClusterLabs/pacemaker/pull/2278 
>> 
>>  The problem does not occur with the release version of Pacemaker 2.0.5 or 
> the Pacemaker included with RHEL8.3.
>> 
>>  This issue has a huge impact on the user.
>> 
>>  Perhaps it also affects the control of other resources that utilize 
> crm_mon.
>> 
>>  Please improve the release version of RHEL8.4 so that it includes Pacemaker 
> which does not cause this problem.
>>    * Distributions other than RHEL may also be affected in future releases.
>> 
>>  ----
>>  This content is the same as the following Bugzilla.
>>    - https://bugs.clusterlabs.org/show_bug.cgi?id=5471 
>>  ----
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>>  _______________________________________________
>>  Manage your subscription:
>>  https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>>  ClusterLabs home: https://www.clusterlabs.org/ 
>