[ClusterLabs] Custom RA for Multi-Tenant MySQL?

Mon Apr 12 01:37:13 EDT 2021

On 11.04.2021 21:47, Eric Robinson wrote:
>> -----Original Message-----
>> From: Users <users-bounces at clusterlabs.org> On Behalf Of Andrei
>> Borzenkov
>> Sent: Sunday, April 11, 2021 1:20 PM
>> To: users at clusterlabs.org
>> Subject: Re: [ClusterLabs] Custom RA for Multi-Tenant MySQL?
>>
>> On 11.04.2021 20:07, Eric Robinson wrote:
>>> We're writing a custom RA for a multi-tenant MySQL cluster that runs in
>> active/standby mode. I've read the RA documentation about what exit codes
>> should be returned for various outcomes, but something is still unclear to
>> me.
>>>
>>> We run multiple instances of MySQL from one filesystem, like this:
>>>
>>> /app_root
>>>                 /mysql1
>>>                 /mysql2
>>>                 /mysql3
>>>                 ...etc.
>>>
>>> The /app_root filesystem lives on a DRBD volume, which is only mounted
>> on the active node.
>>>
>>> When the RA performs a "start," "stop," or "monitor" action on the standby
>> node, the filesystem is not mounted so the mysql instances are not present.
>>
>> You are not supposed to do it in the first place. You are supposed to have
>> ordering constraint that starts MySQL instances after filesystem is available.
>>
> 
> That is what we have. The colocation constraints require mysql -> filesystem -> drbd master. The ordering constraints promote drbd, then start the filesystem, then start mysql.
> 

So how is it possible to have agent to execute "start" or "stop" on the
wrong node?

>>> What should the return  codes for those actions be? Fail? Not installed?
>> Unknown error?
>>>
>>
>> I believe that "not installed" is considered hard error and bans resource from
>> this node. As missing filesystem is probably transient it does not look
>> appropriate. There is no "fail" return code.
>>
>> In any case return code depends on action. For monitor you obviously are
>> expected to return "not running" in this case. "stop" should probably return
>> success (after all, instance is not running, right?) And "start"
>> should return error indication, but it I am not sure what is better - generic
>> error or not running.
>>
> 
> That's a big part of my question. I'm just trying to avoid a condition where the mysql resource is running on node A, and Pacemaker thinks there is a "problem" with it on Node B.
> 

I am not sure I understand the problem. By default nothing will run on
node B after initial probe. If you configured also monitoring in stopped
state, your monitor obviously has to return the truth - that application
is not running.