[Pacemaker] problem: mysql service duplicated due to existing pidfile

Darren.Mansell at opengi.co.uk Darren.Mansell at opengi.co.uk
Tue Feb 9 05:32:58 EST 2010


Hello Oscar.

We use the OCF RA for MySQL and have not had any problems. Our config is:

primitive MySQL ocf:heartbeat:mysql \
        params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/data/mysql" user="mysql" pid="/var/lib/mysql/mysql.pid" socket="/var/lib/mysql/mysql.sock" test_passwd="password" test_table="ldirectord.connectioncheck" test_user="servicecheck" \
        op monitor interval="20s" timeout="10s" \
        meta migration-threshold="10" target-role="Started"

LSB RA's are always a last resort IMO.

Regards
Darren

-----Original Message-----
From: Oscar Remírez de Ganuza Satrústegui [mailto:oscarrdg at unav.es] 
Sent: 09 February 2010 10:29
To: pacemaker at clusterlabs.org
Subject: [Pacemaker] problem: mysql service duplicated due to existing pidfile

Hello everybody,

We have a pacemaker cluster, active-pasive, with a mysql service as a cluster resource. We are using pacemaker 1.0.6 + corosync 1.1.2 on RedHat 5.3.
Everything is working fine, except when we reset (power down and start) a node (through stonith) and want to run the mysql service again on the node.

Let me explain the problem:
Pacemaker manage the mysql service through the /etc/init.d service (lsb:mysql), and usually it can start and stop it ok. But when the active node is "stonithed", there is left on disk the pid file (node1.pid), as the service was not stopped properly.
So here comes the problem: when pacemaker tries to start the service again, it start it twice.

 From /var/log/messages:

* The order to start the service is done:
Feb  9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op: 
Performing key=30:119:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_start_0 )
Feb  9 10:45:06 herculespre lrmd: [2176]: info: 
rsc:mysql-unav-service:13: start
Feb  9 10:45:06 herculespre lrmd: [7602]: WARN: For LSB init script, no additional parameters are needed.
Feb  9 10:45:06 herculespre lrmd: [2176]: info: RA output: 
(mysql-unav-service:start:stdout) Starting MySQL  eb  9 10:45:06 herculespre lrmd: [2176]: info: RA output: 
(mysql-unav-service:start:stdout) [  OK  ] Feb  9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event: LRM operation mysql-unav-service_start_0 (call=13, rc=0, cib-update=19,
confirmed=true) ok

* But then a monitor operation is performed at the same time and it detects that it is not running:
Feb  9 10:45:06 herculespre crmd: [2181]: info: do_lrm_rsc_op: 
Performing key=31:119:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_monitor_20000 )
Feb  9 10:45:06 herculespre crmd: [2181]: info: process_lrm_event: LRM operation mysql-unav-service_monitor_20000 (call=14, rc=7, cib-update=20, confirmed=false) not running Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch: 
Update relayed from jasonpre
Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_local_callback: 
Expanded fail-count-mysql-unav-service=value++ to 1 Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_trigger_update: 
Sending flush op to all hosts for: fail-count-mysql-unav-service (1) Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_perform_update: 
Sent update 26: fail-count-mysql-unav-service=1 Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_ais_dispatch: 
Update relayed from jasonpre
Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_trigger_update: 
Sending flush op to all hosts for: last-failure-mysql-unav-service
(1265708729)
Feb  9 10:45:06 herculespre attrd: [2179]: info: attrd_perform_update: 
Sent update 29: last-failure-mysql-unav-service=1265708729

* So it is decided to stop the resource and start it again!!!:
Feb  9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op: 
Performing key=3:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_stop_0 )
Feb  9 10:45:07 herculespre lrmd: [2176]: info: 
rsc:mysql-unav-service:15: stop
Feb  9 10:45:07 herculespre lrmd: [7743]: WARN: For LSB init script, no additional parameters are needed.
Feb  9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event: LRM operation mysql-unav-service_monitor_20000 (call=14, status=1, cib-update=0, confirmed=true) Cancelled Feb  9 10:45:07 herculespre lrmd: [2176]: info: RA output: 
(mysql-unav-service:stop:stdout) MySQL manager or server PID file could not be found![FAILED]

* But the stop operation failed..
Feb  9 10:45:07 herculespre crmd: [2181]: info: process_lrm_event: LRM operation mysql-unav-service_stop_0 (call=15, rc=0, cib-update=21,
confirmed=true) ok
Feb  9 10:45:07 herculespre crmd: [2181]: info: do_lrm_rsc_op: 
Performing key=30:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_start_0 )
Feb  9 10:45:07 herculespre lrmd: [2176]: info: 
rsc:mysql-unav-service:16: start
Feb  9 10:45:07 herculespre lrmd: [7762]: WARN: For LSB init script, no additional parameters are needed.
Feb  9 10:45:07 herculespre lrmd: [2176]: info: RA output: 
(mysql-unav-service:start:stdout) Starting MySQL Feb  9 10:45:07 herculespre lrmd: [2176]: info: RA output: 
(mysql-unav-service:start:stdout) .
Feb  9 10:45:08 herculespre lrmd: [2176]: info: RA output: 
(mysql-unav-service:start:stdout) [  OK  ] Feb  9 10:45:08 herculespre lrmd: [2176]: info: RA output: 
(mysql-unav-service:start:stdout)
Feb  9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event: LRM operation mysql-unav-service_start_0 (call=16, rc=0, cib-update=22,
confirmed=true) ok
Feb  9 10:45:08 herculespre crmd: [2181]: info: do_lrm_rsc_op: 
Performing key=2:122:0:6ce08921-3e51-462c-949a-aa540c206e3c
op=mysql-unav-service_monitor_20000 )
Feb  9 10:45:08 herculespre crmd: [2181]: info: process_lrm_event: LRM operation mysql-unav-service_monitor_20000 (call=17, rc=0, cib-update=23, confirmed=false) ok
* And the start operation is performed ok

So we have now twice the same service running, and logically this is not working properly, as the two services try to access the same files and system ports.

 From /var/log/mysql
100209 10:45:07 mysqld_safe Starting mysqld daemon with databases from /usr/local/etc2/mysql-unav/var
100209 10:45:07 mysqld_safe Starting mysqld daemon with databases from /usr/local/etc2/mysql-unav/var
InnoDB: Unable to lock ./ibdata1, error: 11
InnoDB: Check that you do not already have another mysqld process
InnoDB: using the same InnoDB data or log files.
100209 10:45:07  InnoDB: Retrying to lock the first data file ...

If we start the service manually (/etc/init.d/mysql start) whith the pid file existing, there is no problem.
So, what it is doing pacemaker wrong?

I reckon that maybe we have some timeouts not properly configured.
Right now we have:
Operations timeout: default-action-timeout="120s"
Mysql Resource Monitor: op monitor interval="20s" timeout="20s"

Is there someone with a similar setup (with mysql or with another
service) and experiencing any problems?

Can you tell me some safe values for the timeouts?

Thank you very much for your help!

Regards,

PS: I did not want to spam the list with more logs or configuration files, but if they are needed i can attach them, of course.

---
Oscar Remírez de Ganuza
Servicios Informáticos
Universidad de Navarra
Ed. de Derecho, Campus Universitario
31080 Pamplona (Navarra), Spain
tfno: +34 948 425600 Ext. 3130
http://www.unav.es/SI





More information about the Pacemaker mailing list