[ClusterLabs] resource agent OCF_HEARTBEAT_GALERA issue/broken - ?

lejeczek peljasz at yahoo.co.uk
Fri Jul 29 08:57:15 EDT 2022


On 28/07/2022 00:33, Reid Wahl wrote:
> On Wed, Jul 27, 2022 at 2:08 AM lejeczek via Users
> <users at clusterlabs.org> wrote:
>>
>>
>> On 26/07/2022 20:56, Reid Wahl wrote:
>>> On Tue, Jul 26, 2022 at 4:21 AM lejeczek via Users
>>> <users at clusterlabs.org> wrote:
>>>> Hi guys
>>>>
>>>> I set up a clone of a new instance of mariadb galera - which otherwise,
>>>> outside of pcs works - but I see something weird.
>>>>
>>>> Firstly cluster claims it's all good:
>>>>
>>>> -> $ pcs status --full
>>>> ...
>>>>
>>>>      * Clone Set: mariadb-apps-clone [mariadb-apps] (promotable):
>>>>        * mariadb-apps    (ocf::heartbeat:galera):     Master
>>>> sucker.internal.ccn
>>>>        * mariadb-apps    (ocf::heartbeat:galera):     Master
>>>> drunk.internal.ccn
>>> Clearly the problem is that your server is drunk.
>>>
>>>> but that mariadb is _not_ started actually.
>>>>
>>>> In clone's attr I set:
>>>>
>>>> config=/apps/etc/mariadb-server.cnf
>>>>
>>>> I also for peace of mind set:
>>>>
>>>> datadir=/apps/mysql/data
>>>>
>>>> even tough '/apps/etc/mariadb-server.cnf' declares that & other bits -
>>>> again, works outside of pcs.
>>>>
>>>> Then I see in pacemaker logs:
>>>>
>>>> notice: mariadb-apps_start_0 at drunk.internal.ccn output [ 220726 11:56:13
>>>> mysqld_safe Logging to '/tmp/tmp.On5VnzOyaF'.\n220726 11:56:13
>>>> mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql\n ]
>>>>
>>>> .. and I think what the F?
>>>>
>>>> resource-agents-4.9.0-22.el8.x86_64
>>>>
>>>> All thoughts share are much appreciated.
>>>>
>>>> many thanks, L.
>>>>
>>> How do you start mariadb outside of pacemaker's control?
>> simply by:
>> -> $ sudo -u mysql /usr/libexec/mysqld
>> --defaults-file=/apps/etc/mariadb-server.cnf
>> --wsrep-new-clusterrep
> Got it. FWIW, `--wsrep-new-clusterrep` doesn't appear to be a valid
> option (no search results and nothing in the help output). Maybe this
> is a typo in the email; `--wsrep-new-cluster` is valid.
>
>>> It seems that *something* is running, based on the "Starting" message
>>> and the fact that the resources are still in Started state...
>> Yes, a "regular" instance of galera (which outside of
>> pacemaker works with "regular" systemd) is running as ofc
>> one resource already.
>>
>>> The logic to start and promote the galera resource is contained within
>>> /usr/lib/ocf/resource.d/heartbeat/galera and
>>> /usr/lib/ocf/lib/heartbeat/mysql-common.sh. I encourage you to inspect
>>> those for any relevant differences between your own startup method and
>>> that of the resource agent.
>>>
>>> As one example, note that the resource agent uses mysqld_safe by
>>> default. This is configurable via the `binary` option for the
>>> resource. Be sure that you've looked at all the available options
>>> (`pcs resource describe ocf:heartbeat:galera`) and configured any of
>>> them that you need. You've definitely at least started that process
>>> with config and datadir.
>> log snippet I pasted was not to emphasize 'mysqld_safe' but
>> the fact that resource does:
>> ...
>> Starting mysqld daemon with databases from /var/lib/mysql
>> ...
>>
>> Yes, I did check man pages for options and if if you suggest
>> checking source code then I say that's  case for a bug report.
> For something like pacemaker, I wouldn't point a user to the code.
> galera and mysql-common.sh are shell scripts, so they're easier to
> interpret. The idea was "maybe you can find a discrepancy between how
> the scripts are starting mysql, and how you're manually starting
> mysql."
>
> Filing a bug might be reasonable. I don't have much hands-on
> experience with mysql or with this resource agent.
>
>> I tried my best to make it clear, doubt can do that better
>> but I'll re-try:
>> a) bits needed to run a "new/second" instance are all in
>> 'mariadb-server.cnf' and galera works outside of pacemaker
>> (cmd above)
>> b) if I use attrs: 'datadir' & 'config' yet resource tells
>> me "..daemon with databases from /var/lib/mysql..." then..
>> people looking into the code should perhaps be you (as
>> Redhat employee) and/or other authors/developers - if I
>> filed it as BZ - no?
>>
>> It should be easily reproducible, have:
>> 1) a galera (as a 2nd/3rd/etc. instance, outside & with
>> different settings from "standard" OS's rpm installation)
>> work & tested
>> 2) set up a resource:
>> -> $ pcs resource create mariadb-apps ocf:heartbeat:galera
>> cluster_host_map="drunk.internal.ccn:10.0.1.6;sucker.internal.ccn:10.0.1.7"
>> config="/apps/etc/mariadb-server.cnf"
>> wsrep_cluster_address="gcomm://10.0.1.6:5567,10.0.1.7:5567"
>> user=mysql group=mysql check_user="pacemaker"
>> check_passwd="pacemaker#9897" op monitor OCF_CHECK_LEVEL="0"
>> timeout="30s" interval="20s" op monitor role="Master"
>> OCF_CHECK_LEVEL="0" timeout="30s" interval="10s" op monitor
>> role="Slave" OCF_CHECK_LEVEL="0" timeout="30s"
>> interval="30s" promotable promoted-max=2 meta
>> failure-timeout=30s
> The datadir option isn't specified in the resource create command
> above. If datadir not explicitly set, then the resource agent uses
> /var/lib/mysql by default:
> ```
> mysql_common_start()
> {
>      local mysql_extra_params="$1"
>      local pid
>
>      $SU - $OCF_RESKEY_user -s /bin/sh -c \
>      "${OCF_RESKEY_binary} --defaults-file=$OCF_RESKEY_config \
>      --pid-file=$OCF_RESKEY_pid \
>      --socket=$OCF_RESKEY_socket \
>      --datadir=$OCF_RESKEY_datadir \
>      --log-error=$OCF_RESKEY_log \
>      $OCF_RESKEY_additional_parameters \
>      $mysql_extra_params >/dev/null 2>&1" &
>      pid=$!
> ```
>
> where OCF_RESKEY_datadir is set to /var/lib/mysql if you don't set it yourself.
>
> As mentioned above, I'm not very familiar with mysql, but I would
> expect the CLI --datadir=/var/lib/mysql option to override anything in
> the defaults-file.
>
Wrong copy&paste, I did quite a few of resource "create"

-> $ pcs resource config mariadb-apps-clone
Clone: mariadb-apps-clone
   Meta Attributes: mariadb-apps-clone-meta_attributes
     failure-timeout=30s
     promotable=true
     promoted-max=2
   Resource: mariadb-apps (class=ocf provider=heartbeat type=galera)
     Attributes: mariadb-apps-instance_attributes
       check_passwd=pacemaker#98
       check_user=pacemaker
cluster_host_map=drunk.internal.ccn:10.0.1.6;sucker.internal.ccn:10.0.1.7
       config=/apps/etc/mariadb-server.cnf
       datadir=/apps/mysql/data
       group=mysql
       log=/var/log/mariadb/maria-apps.log
       pid=/run/mariadb/maria-apps.pid
       socket=/var/lib/mysql/maria-apps.sock
       user=mysql
       wsrep_cluster_address=gcomm://10.0.1.6:5567,10.0.1.7:5567

but, even if "datadir" was not used then "config" should do - otherwise 
what is "config" for - anybody would ask.

I said that those bits I have defined there in mariadb config, but it 
seems that attr "config" is also ignored - I have "datadir" in mariadb 
config.

To me it seems that 'ocf_heartbeat_galera', if not completely is, then 
some bits have seriously fucked up.

Again, it should be easy to reproduce - if only devel/authors wanted to 
- before I filed a BZ.

Only remember ! to create/use 'mariadb' another, different(still using 
regular binaries from rpm)  from standard-OS-rpm instance/installation - 
such a resource, with "standard" galera, I have work fine.

many thanks, L.



More information about the Users mailing list