[Pacemaker] crm_master triggering assert section != NULL

Wed Oct 12 18:10:37 EDT 2011

I found the answer about cluster-wide attribute,  very easy and pretty 
elegant.

On 11-10-12 05:09 PM, Yves Trudeau wrote:
> Hi Florian,
>
> On 11-10-12 04:09 PM, Florian Haas wrote:
>> On 2011-10-12 21:46, Yves Trudeau wrote:
>>> Hi Florian,
>>>    sure, let me state the requirements.  If those requirements can be
>>> met, pacemaker will be much more used to manage MySQL replication.
>>> Right now, although at Percona I deal with many large MySQL 
>>> deployments,
>>> none are using the current agent.   Another tool, MMM is currently used
>>> but it is currently orphan and suffers from many pretty fundamental
>>> flaws (while implement about the same logic as below).
>>>
>>> Consider a pool of N identical MySQL servers.  In that case we need:
>>> - N replication resources (it could be the MySQL RA)
>>> - N Reader_vip
>>> - 1 Writer_vip
>>>
>>> Reader vips are used by the application to run queries that do not
>>> modify data, usually accessed is round-robin fashion.  When the
>>> application needs to write something, it uses the writer_vip.  That's
>>> how read/write splitting is implement in many many places.
>>>
>>> So, for the agent, here are the requirements:
>>>
>>> - No need to manage MySQL itself
>>>
>>> The resource we are interested in is replication, MySQL itself is at
>>> another level.  If the RA is to manage MySQL, it must not interfere.
>>>
>>> - the writer_vip must be assigned only to the master, after it is 
>>> promoted
>>>
>>> This, is easy with colocation
>> Agreed.
>>
>>> - After the promotion of a new master, all slaves should be allowed to
>>> complete the application of their relay logs prior to any change master
>>>
>>> The current RA does not do that but it should be fairly easy to 
>>> implement.
>> That's a use case for a pre-promote and post-promote notification. Like
>> the mysql RA currently does.
>>
>>> - After its promotion and before allowing writes to it, a master should
>>> publish its current master file and position.   I am using resource
>>> parameters in the CIB for these (I am wondering if transient attributes
>>> could be used instead)
>> They could, and you should. Like the mysql RA currently does.
>>
>
> The RA I downloaded following instruction of the wiki stating it is 
> the latest sources:
>
> wget -O resource-agents.tar.bz2 
> http://hg.linux-ha.org/agents/archive/tip.tar.bz2
>
> has the following code to change the master:
>
>     ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
>         -e "CHANGE MASTER TO MASTER_HOST='$master_host', \
>                              
> MASTER_USER='$OCF_RESKEY_replication_user', \
>                              
> MASTER_PASSWORD='$OCF_RESKEY_replication_passwd'"
>
> which does not include file and position.
>
>
>>> - After the promotion of a new master, all slaves should be 
>>> reconfigured
>>> to point to the new master host with correct file and position as
>>> published by the master when it was promoted
>>>
>>> The current RA does not set file and position.
>> "The current RA" being ocf:heartbeat:mysql?
>>
>> A cursory grep for "CRM_ATTR" in ocf:heartbeat:mysql indicates that it
>> does set those.
>
> grep CRM_ATTR returned nothing.
>
> yves at yves-desktop:/opt/pacemaker/Cluster-Resource-Agents-7a11934b142d/heartbeat$ 
> grep -i CRM_ATTR mysql
> yves at yves-desktop:/opt/pacemaker/Cluster-Resource-Agents-7a11934b142d/heartbeat$ 
>
>
> and that is the latest from Mercurial...
>
>>> Under any non-trivial
>>> load this will fail.  The current RA is not designed to stores the
>>> information.  The new RA uses the information stored in the cib along
>>> with post-promote notification.
>> Is this point moot considering my previous statement?
>>
>>> - each slave and the master may have one or more reader_vip provided
>>> that they are replicating correctly (no lag beyond a threshold,
>>> replication of course working).  If all slaves fails, all reader_vip
>>> should be located on the master.
>> Use a cloned IPaddr2 as a non-anonymous clone, thereby managing an IP
>> range. Add a location constraint restricting the clone instance to run
>> on only those nodes where a specific node attribute is set. Or
>> conversely, forbid them from running on nodes where said attribute is
>> not set. Manage that attribute from your RA.
>
> That's clever, never thought about it.
>
>>> The current RA either kills MySQL or does nothing, it doesn't care 
>>> about
>>> reader_vips.  Killling MySQL on a busy server with 256GB of buffer pool
>>> is enough for someone to lose his job...  The new RA adjusts location
>>> scores for the reader_vip resources dynamically.
>> Like I said, that's managing one resource from another, which is a total
>> nightmare. It's also not necessary, I dare say, given the approach I
>> outlined above.
>>
> I'll explore the node attribute approach, I like it.
>
> Is it possible to create an attribute that does not belong to a node 
> but is cluster wide?
>>> - the RA should implement a protection against flapping in case a slave
>>> hovers around the replication lag threshold
>> You should get plenty of inspiration there from how the dampen parameter
>> is used in ocf:pacemaker:ping.
>>
> ok, I'll check
>>> The current RA does implement that but it is not required giving the
>>> context.  The new RA does implement flapping protection.
>>>
>>> - upon demote of a master, the RA _must_ attempt to kill all user
>>> (non-system) connections
>>>
>>> The current RA does not do that but it is easy to implement
>> Yeah, as I assume it would be in the other one.
>>
>>> - Slaves must be read-only
>>>
>>> That's fine, handled by the current RA.
>> Correct.
>>
>>> - Monitor should test MySQL and replication.  If either is bad, vips
>>> should be moved away.  Common errors should not trigger actions.
>> Like I said, should be feasible with the node attribute approach
>> outlined above. No reason to muck around with the resources directly.
>>
>>> That's handled by the current RA for most of if.  The error handling
>>> could be added.
>>>
>>> - Slaves should update their master score according to the state of
>>> their replication.
>>>
>>> Handled by both RA
>> Right.
>>
>>> So, at the minimum, the RA needs to be able to store the master
>>> coordinate information, either in the resource parameters or in
>>> transient attributes and must be able to modify resources location
>>> scores.  The script _was_ working before I got the cib issue, maybe it
>>> was purely accidental but it proves the concept.  I was actually
>>> implement/testing the relay_log completion stuff.  I chose not to use
>>> the current agent because I didn't want to manage MySQL itself, just
>>> replication.
>>>
>>> I am wide open to argue any Pacemaker or RA architecture/design part 
>>> but
>>> I don't want to argue the replication requirements, they are 
>>> fundamental
>>> in my mind.
>> Yup, and I still believe that ocf:heartbeat:mysql either already
>> addresses those, or they could be addressed in a much cleaner fashion
>> than writing a new RA.
>>
>> Now, if the only remaining point is "but I want to write an agent that
>> can do _less_ than an existing one" (namely, manage only replication,
>> not the underlying daemon), then I guess I can't argue with that, but
>> I'd still believe that would be a suboptimal approach.
> Ohh...  don't get me wrong, I am not the kind of guy that takes pride 
> in having re-invented the flat tire.  I want an opensource _solution_ 
> I can offer to my customers.  I think part of the problem here is that 
> we are not talking about the same ocf:heartbeat:mysql RA.  What is 
> mainstream is what you can get with "apt-get install pacemaker" on 
> 10.04 LTS for example.  This is 1.0.8.  I also tried 1.0.11 and still 
> it is obviously not the same version.  I got my "latest" agent version 
> as explained in the clusterlabs FAQ page from:
>
> wget -O resource-agents.tar.bz2 
> http://hg.linux-ha.org/agents/archive/tip.tar.bz2
>
> Where can I get the version you are using :)
>
> Regards,
>
> Yves
>
>> Cheers,
>> Florian
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>