[ClusterLabs] resource agent OCF_HEARTBEAT_GALERA issue/broken - ?
Reid Wahl
nwahl at redhat.com
Fri Jul 29 15:58:38 EDT 2022
On Fri, Jul 29, 2022 at 5:57 AM lejeczek via Users
<users at clusterlabs.org> wrote:
>
> On 28/07/2022 00:33, Reid Wahl wrote:
> > On Wed, Jul 27, 2022 at 2:08 AM lejeczek via Users
> > <users at clusterlabs.org> wrote:
> >>
> >>
> >> On 26/07/2022 20:56, Reid Wahl wrote:
> >>> On Tue, Jul 26, 2022 at 4:21 AM lejeczek via Users
> >>> <users at clusterlabs.org> wrote:
> >>>> Hi guys
> >>>>
> >>>> I set up a clone of a new instance of mariadb galera - which otherwise,
> >>>> outside of pcs works - but I see something weird.
> >>>>
> >>>> Firstly cluster claims it's all good:
> >>>>
> >>>> -> $ pcs status --full
> >>>> ...
> >>>>
> >>>> * Clone Set: mariadb-apps-clone [mariadb-apps] (promotable):
> >>>> * mariadb-apps (ocf::heartbeat:galera): Master
> >>>> sucker.internal.ccn
> >>>> * mariadb-apps (ocf::heartbeat:galera): Master
> >>>> drunk.internal.ccn
> >>> Clearly the problem is that your server is drunk.
> >>>
> >>>> but that mariadb is _not_ started actually.
> >>>>
> >>>> In clone's attr I set:
> >>>>
> >>>> config=/apps/etc/mariadb-server.cnf
> >>>>
> >>>> I also for peace of mind set:
> >>>>
> >>>> datadir=/apps/mysql/data
> >>>>
> >>>> even tough '/apps/etc/mariadb-server.cnf' declares that & other bits -
> >>>> again, works outside of pcs.
> >>>>
> >>>> Then I see in pacemaker logs:
> >>>>
> >>>> notice: mariadb-apps_start_0 at drunk.internal.ccn output [ 220726 11:56:13
> >>>> mysqld_safe Logging to '/tmp/tmp.On5VnzOyaF'.\n220726 11:56:13
> >>>> mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql\n ]
> >>>>
> >>>> .. and I think what the F?
> >>>>
> >>>> resource-agents-4.9.0-22.el8.x86_64
> >>>>
> >>>> All thoughts share are much appreciated.
> >>>>
> >>>> many thanks, L.
> >>>>
> >>> How do you start mariadb outside of pacemaker's control?
> >> simply by:
> >> -> $ sudo -u mysql /usr/libexec/mysqld
> >> --defaults-file=/apps/etc/mariadb-server.cnf
> >> --wsrep-new-clusterrep
> > Got it. FWIW, `--wsrep-new-clusterrep` doesn't appear to be a valid
> > option (no search results and nothing in the help output). Maybe this
> > is a typo in the email; `--wsrep-new-cluster` is valid.
> >
> >>> It seems that *something* is running, based on the "Starting" message
> >>> and the fact that the resources are still in Started state...
> >> Yes, a "regular" instance of galera (which outside of
> >> pacemaker works with "regular" systemd) is running as ofc
> >> one resource already.
> >>
> >>> The logic to start and promote the galera resource is contained within
> >>> /usr/lib/ocf/resource.d/heartbeat/galera and
> >>> /usr/lib/ocf/lib/heartbeat/mysql-common.sh. I encourage you to inspect
> >>> those for any relevant differences between your own startup method and
> >>> that of the resource agent.
> >>>
> >>> As one example, note that the resource agent uses mysqld_safe by
> >>> default. This is configurable via the `binary` option for the
> >>> resource. Be sure that you've looked at all the available options
> >>> (`pcs resource describe ocf:heartbeat:galera`) and configured any of
> >>> them that you need. You've definitely at least started that process
> >>> with config and datadir.
> >> log snippet I pasted was not to emphasize 'mysqld_safe' but
> >> the fact that resource does:
> >> ...
> >> Starting mysqld daemon with databases from /var/lib/mysql
> >> ...
> >>
> >> Yes, I did check man pages for options and if if you suggest
> >> checking source code then I say that's case for a bug report.
> > For something like pacemaker, I wouldn't point a user to the code.
> > galera and mysql-common.sh are shell scripts, so they're easier to
> > interpret. The idea was "maybe you can find a discrepancy between how
> > the scripts are starting mysql, and how you're manually starting
> > mysql."
> >
> > Filing a bug might be reasonable. I don't have much hands-on
> > experience with mysql or with this resource agent.
> >
> >> I tried my best to make it clear, doubt can do that better
> >> but I'll re-try:
> >> a) bits needed to run a "new/second" instance are all in
> >> 'mariadb-server.cnf' and galera works outside of pacemaker
> >> (cmd above)
> >> b) if I use attrs: 'datadir' & 'config' yet resource tells
> >> me "..daemon with databases from /var/lib/mysql..." then..
> >> people looking into the code should perhaps be you (as
> >> Redhat employee) and/or other authors/developers - if I
> >> filed it as BZ - no?
> >>
> >> It should be easily reproducible, have:
> >> 1) a galera (as a 2nd/3rd/etc. instance, outside & with
> >> different settings from "standard" OS's rpm installation)
> >> work & tested
> >> 2) set up a resource:
> >> -> $ pcs resource create mariadb-apps ocf:heartbeat:galera
> >> cluster_host_map="drunk.internal.ccn:10.0.1.6;sucker.internal.ccn:10.0.1.7"
> >> config="/apps/etc/mariadb-server.cnf"
> >> wsrep_cluster_address="gcomm://10.0.1.6:5567,10.0.1.7:5567"
> >> user=mysql group=mysql check_user="pacemaker"
> >> check_passwd="pacemaker#9897" op monitor OCF_CHECK_LEVEL="0"
> >> timeout="30s" interval="20s" op monitor role="Master"
> >> OCF_CHECK_LEVEL="0" timeout="30s" interval="10s" op monitor
> >> role="Slave" OCF_CHECK_LEVEL="0" timeout="30s"
> >> interval="30s" promotable promoted-max=2 meta
> >> failure-timeout=30s
> > The datadir option isn't specified in the resource create command
> > above. If datadir not explicitly set, then the resource agent uses
> > /var/lib/mysql by default:
> > ```
> > mysql_common_start()
> > {
> > local mysql_extra_params="$1"
> > local pid
> >
> > $SU - $OCF_RESKEY_user -s /bin/sh -c \
> > "${OCF_RESKEY_binary} --defaults-file=$OCF_RESKEY_config \
> > --pid-file=$OCF_RESKEY_pid \
> > --socket=$OCF_RESKEY_socket \
> > --datadir=$OCF_RESKEY_datadir \
> > --log-error=$OCF_RESKEY_log \
> > $OCF_RESKEY_additional_parameters \
> > $mysql_extra_params >/dev/null 2>&1" &
> > pid=$!
> > ```
> >
> > where OCF_RESKEY_datadir is set to /var/lib/mysql if you don't set it yourself.
> >
> > As mentioned above, I'm not very familiar with mysql, but I would
> > expect the CLI --datadir=/var/lib/mysql option to override anything in
> > the defaults-file.
> >
> Wrong copy&paste, I did quite a few of resource "create"
>
> -> $ pcs resource config mariadb-apps-clone
> Clone: mariadb-apps-clone
> Meta Attributes: mariadb-apps-clone-meta_attributes
> failure-timeout=30s
> promotable=true
> promoted-max=2
> Resource: mariadb-apps (class=ocf provider=heartbeat type=galera)
> Attributes: mariadb-apps-instance_attributes
> check_passwd=pacemaker#98
> check_user=pacemaker
> cluster_host_map=drunk.internal.ccn:10.0.1.6;sucker.internal.ccn:10.0.1.7
> config=/apps/etc/mariadb-server.cnf
> datadir=/apps/mysql/data
> group=mysql
> log=/var/log/mariadb/maria-apps.log
> pid=/run/mariadb/maria-apps.pid
> socket=/var/lib/mysql/maria-apps.sock
> user=mysql
> wsrep_cluster_address=gcomm://10.0.1.6:5567,10.0.1.7:5567
>
> but, even if "datadir" was not used then "config" should do - otherwise
> what is "config" for - anybody would ask.
>
> I said that those bits I have defined there in mariadb config, but it
> seems that attr "config" is also ignored - I have "datadir" in mariadb
> config.
The `config` option isn't ignored. Like I said in my previous email,
the resource agent explicitly specifies a `--datadir` argument no
matter what. If you don't specify the `datadir` option, then it uses
`--defaults-file /apps/etc/mariadb-server.cnf
--datadir=/var/lib/mysql`. The resource agent explicitly passes both
the `--defaults-file` option and the `--datadir` option to the
`mysqld` start command. The `--datadir=/var/lib/mysql` option probably
overrides whatever may be in /apps/etc/mariadb-server.cnf.
>
> To me it seems that 'ocf_heartbeat_galera', if not completely is, then
> some bits have seriously fucked up.
It would be reasonable at least to add a note to the resource agent
metadata, to say something like
datadir: Directory containing databases. If this option is not
specified, then --datadir=/var/lib/mysql will be used when starting
the database.
instead of the current
datadir: Directory containing databases
>
> Again, it should be easy to reproduce - if only devel/authors wanted to
> - before I filed a BZ.
To be clear, I do not personally maintain the resource agents.
However, if this is reproducible on our end, then the best way to get
action on it is to file a BZ, ensuring that it's seen and tracked.
I tested without a mariadb installation before I sent my previous
email -- just creating a resource and verifying that the options were
parsed correctly. The resource agent correctly ran the mysql start
command with the datadir I specified. Testing again now so that I can
share the debug output from `pcs resource debug-promote mariadb-apps
--full` (since the galera promote operation is what runs the mysql
start command):
+ 19:55:36: mysql_common_start:242: runuser - mysql -s /bin/sh -c
'/usr/bin/mysqld_safe --defaults-file=/apps/etc/mariadb-server.cnf
--pid-file=/var/run/mysql/mysqld.pid
--socket=/var/lib/mysql/mysql.sock --datadir=/apps/mysql/data
--log-error=/var/log/mysqld.log
--wsrep-cluster-address=gcomm://10.0.1.6:5567,10.0.1.7:5567 >/dev/null
2>&1'
Since the parsing appears to work and the resource agent appears to
use the given datadir option, it is odd that your mysql startup
message shows /var/lib/mysql being used. I don't have a mysql
installation currently to test further and don't have the time to set
one up at the moment.
>
> Only remember ! to create/use 'mariadb' another, different(still using
> regular binaries from rpm) from standard-OS-rpm instance/installation -
> such a resource, with "standard" galera, I have work fine.
>
> many thanks, L.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Regards,
Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker
More information about the Users
mailing list