[ClusterLabs] Custom RA for Multi-Tenant MySQL?

Andrei Borzenkov arvidjaar at gmail.com
Mon Apr 12 01:37:13 EDT 2021

On 11.04.2021 21:47, Eric Robinson wrote:
>> -----Original Message-----
>> From: Users <users-bounces at clusterlabs.org> On Behalf Of Andrei
>> Borzenkov
>> Sent: Sunday, April 11, 2021 1:20 PM
>> To: users at clusterlabs.org
>> Subject: Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?
>> On 11.04.2021 20:07, Eric Robinson wrote:
>>> We're writing a custom RA for a multi-tenant MySQL cluster that runs in
>> active/standby mode. I've read the RA documentation about what exit codes
>> should be returned for various outcomes, but something is still unclear to
>> me.
>>> We run multiple instances of MySQL from one filesystem, like this:
>>> /app_root
>>>                 /mysql1
>>>                 /mysql2
>>>                 /mysql3
>>>                 ...etc.
>>> The /app_root filesystem lives on a DRBD volume, which is only mounted
>> on the active node.
>>> When the RA performs a "start," "stop," or "monitor" action on the standby
>> node, the filesystem is not mounted so the mysql instances are not present.
>> You are not supposed to do it in the first place. You are supposed to have
>> ordering constraint that starts MySQL instances after filesystem is available.
> That is what we have. The colocation constraints require mysql -> filesystem -> drbd master. The ordering constraints promote drbd, then start the filesystem, then start mysql.

So how is it possible to have agent to execute "start" or "stop" on the
wrong node?

>>> What should the return  codes for those actions be? Fail? Not installed?
>> Unknown error?
>> I believe that "not installed" is considered hard error and bans resource from
>> this node. As missing filesystem is probably transient it does not look
>> appropriate. There is no "fail" return code.
>> In any case return code depends on action. For monitor you obviously are
>> expected to return "not running" in this case. "stop" should probably return
>> success (after all, instance is not running, right?) And "start"
>> should return error indication, but it I am not sure what is better - generic
>> error or not running.
> That's a big part of my question. I'm just trying to avoid a condition where the mysql resource is running on node A, and Pacemaker thinks there is a "problem" with it on Node B.

I am not sure I understand the problem. By default nothing will run on
node B after initial probe. If you configured also monitoring in stopped
state, your monitor obviously has to return the truth - that application
is not running.

More information about the Users mailing list