[ClusterLabs] unable to start mysql as a clustered service, OK stand-alone

Thu Aug 11 12:09:37 EDT 2016

On 09/08/16 16:20 -0400, bergman at merctech.com wrote:
> I've got a 3-node CentOS6 cluster and I'm trying to add mysql 5.1 as
> a new service. Other cluster services (IP addresses, Postgresql,
> applications) work fine.
> 
> The mysql config file and data files are located on shared,
> cluster-wide storage (GPFS), as are config/data files for other
> services which work correctly.
> 
> On each node, I can successfully start mysql locally via:
> 	service mysqld start
> and via:
> 	rg_test test /etc/cluster/cluster.conf start service mysql
> 
> (in each case, the corresponding command with the 'stop' option will
> also successfully shut down mysql).
> 
> However, attempting to start the mysql service with clusvcadm
> results in the service failing over from one node to the next, and
> being marked as "stopped" after the last node.
> 
> Each failover happens very quickly, in about 5 seconds. I suspect
> that rgmanager isn't waiting long enough for mysql to start before
> checking if it is running and I have added startup delays in
> cluster.conf, but they don't seem to be honored. Nothing is written
> into the mysql log file at this time -- no startup or failure
> messages, which implies that the mysqld never begins to run. The
> only log entries (/var/log/messages, /var/log/cluster/*, etc)
> reference rgmanager, not the mysql process itself.
> 
> Any suggestions?

see inline below...

> RHCS components:
> 	cman-3.0.12.1-78.el6.x86_64
> 	luci-0.26.0-78.el6.centos.x86_64
> 	rgmanager-3.0.12.1-26.el6_8.3.x86_64
> 	ricci-0.16.2-86.el6.x86_64
> 	corosync-1.4.7-5.el6.x86_64
> 
> 
> --------------------- /etc/cluster/cluster.conf (edited subset) -----------------
> <cluster config_version="63" name="example-rhcs">
>         <rm>
>                 <resources>
>                         <postgres-8 config_file="/var/lib/pgsql/data/postgresql.conf" name="PostgreSQL8" postmaster_user="postgres" startup_wait="25"/>
>                         <ip address="192.168.169.173" sleeptime="10"/>
>                         <mysql config_file="/cluster_shared/mysql_centos6/etc/my.cnf" listen_address="192.168.169.173" name="mysql" shutdown_wait="10" startup_wait="30"/>
>                 </resources>
>                 <service max_restarts="3" name="mysql" recovery="restart" restart_expire_time="180">
>                         <ip ref="192.168.169.173">
>                                 <mysql ref="mysql"/>
>                         </ip>
>                 </service>
>         </rm>
> </cluster>
> --------------------------------------------------------------------------
> 
> 
> --------------------- /var/log/cluster/rgmanager.log from attempt to start mysql with clusvcadm -----------------------
> Aug 08 11:58:16 rgmanager Recovering failed service service:mysql
> Aug 08 11:58:16 rgmanager [ip] Link for eth2: Detected
> Aug 08 11:58:16 rgmanager [ip] Adding IPv4 address 192.168.169.173/24 to eth2
> Aug 08 11:58:16 rgmanager [ip] Pinging addr 192.168.169.173 from dev eth2
> Aug 08 11:58:18 rgmanager [ip] Sending gratuitous ARP: 192.168.169.173 c8:1f:66:e8:bb:34 brd ff:ff:ff:ff:ff:ff
> Aug 08 11:58:19 rgmanager [mysql] Verifying Configuration Of mysql:mysql
> Aug 08 11:58:19 rgmanager [mysql] Verifying Configuration Of mysql:mysql > Succeed
> Aug 08 11:58:19 rgmanager [mysql] Monitoring Service mysql:mysql
> Aug 08 11:58:19 rgmanager [mysql] Checking Existence Of File /var/run/cluster/mysql/mysql:mysql.pid [mysql:mysql] > Failed
> Aug 08 11:58:19 rgmanager [mysql] Monitoring Service mysql:mysql > Service Is Not Running
> Aug 08 11:58:19 rgmanager [mysql] Starting Service mysql:mysql
> Aug 08 11:58:19 rgmanager [mysql] Looking For IP Address > Succeed -  IP Address Found
> Aug 08 11:58:20 rgmanager [mysql] Starting Service mysql:mysql > Succeed
> Aug 08 11:58:21 rgmanager [mysql] Monitoring Service mysql:mysql
> Aug 08 11:58:21 rgmanager 1 events processed
> Aug 08 11:58:21 rgmanager [mysql] Checking Existence Of File /var/run/cluster/mysql/mysql:mysql.pid [mysql:mysql] > Failed

As business of launching services used to be incredibly racy (and
often, still is), where launching scripts presumably finishes so as
to denote that the service is ready for service while that's not
entirely true as it in fact is still "just warming up" (perhaps not
even the PID file is created by then), I can imagine that hackish
workaround

> 127         # Sleep 1 sec before checking status so mysqld can start
> 128         sleep 1

may not be enough in your deployment (large DB, high load due to other
[clustered or not] services unlike in rg_test scenario...) so I'd
start with tweaking that value in /usr/share/cluster/mysql.sh to some
higher figures to see if it helps.

> Aug 08 11:58:21 rgmanager [mysql] Monitoring Service mysql:mysql > Service Is Not Running
> Aug 08 11:58:21 rgmanager start on mysql "mysql" returned 7 (unspecified)
> Aug 08 11:58:21 rgmanager #68: Failed to start service:mysql; return value: 1
> Aug 08 11:58:21 rgmanager Stopping service service:mysql
> Aug 08 11:58:21 rgmanager [mysql] Verifying Configuration Of mysql:mysql
> Aug 08 11:58:21 rgmanager [mysql] Verifying Configuration Of mysql:mysql > Succeed
> Aug 08 11:58:21 rgmanager [mysql] Stopping Service mysql:mysql
> Aug 08 11:58:21 rgmanager [mysql] Checking Existence Of File /var/run/cluster/mysql/mysql:mysql.pid [mysql:mysql] > Failed - File Doesn't Exist
> Aug 08 11:58:21 rgmanager [mysql] Stopping Service mysql:mysql > Succeed
> --------------------------------------------------------------------------------

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160811/8eb3d880/attachment-0003.sig>