[Pacemaker] Problems cleaning up resource with hard errors

Tue Jul 27 04:09:14 EDT 2010

Hello,

I have a mysql resource (ocf:heartbeat:mysql) that has failed after playing a bit with the configuration and now it is stopped logging messages like these every 15 minutes:

Jul 27 05:00:10 eol1 pengine: [16643]: notice: unpack_rsc_op: Hard error - p_mysqld_left_start_0 failed with rc=5: Preventing p_mysqld_left from re-starting on eol1
Jul 27 05:00:10 eol1 pengine: [16643]: notice: unpack_rsc_op: Hard error - p_mysqld_left_start_0 failed with rc=5: Preventing p_mysqld_left from re-starting on eol2

If I try to cleanup the resource nothing happens:

[root at eol1 ~]# crm resource cleanup p_mysqld_left
Cleaning up p_mysqld_left on eol1
Cleaning up p_mysqld_left on eol2

[root at eol1 ~]# crm_mon -1
============
Last updated: Tue Jul 27 05:12:21 2010
Stack: openais
Current DC: eol1 - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ eol1 eol2 ]

 p_ip_stacked_left	(ocf::heartbeat:IPaddr2):	Started eol1
 Master/Slave Set: ms_drbd_left
     Masters: [ eol1 ]
     Slaves: [ eol2 ]
 Master/Slave Set: ms_drbd_stacked
     Masters: [ eol1 ]
 Resource Group: g_drbd_left
     p_fs_drbd_left	(ocf::heartbeat:Filesystem):	Started eol1
     p_node_activate_left	(heartbeat:node-activate):	Started eol1
     p_apache_left	(ocf::heartbeat:apache):	Started eol1
     p_mysqld_left	(ocf::heartbeat:mysql):	Stopped 

Failed actions:
    p_mysqld_left_start_0 (node=eol1, call=192, rc=5, status=complete): not installed
    p_mysqld_left_start_0 (node=eol2, call=79, rc=5, status=complete): not installed

This is the log that it is produced when I invoke the cleanup:

Jul 27 05:09:46 eol1 crm_resource: [18035]: info: Invoked: crm_resource -C -r p_mysqld_left -H eol2 
Jul 27 05:10:18 eol1 cibadmin: [18069]: info: Invoked: cibadmin -Ql -o nodes 
Jul 27 05:10:18 eol1 cibadmin: [18070]: info: Invoked: cibadmin -Ql -o resources 
Jul 27 05:10:18 eol1 cibadmin: [18071]: info: Invoked: cibadmin -Ql -o resources 
Jul 27 05:10:18 eol1 crm_resource: [18072]: info: Invoked: crm_resource -C -r p_mysqld_left -H eol1 
Jul 27 05:10:18 eol1 crmd: [16644]: notice: do_lrm_invoke: Not creating resource for a delete event: (null)
Jul 27 05:10:18 eol1 crmd: [16644]: info: send_direct_ack: ACK'ing resource op p_mysqld_left_delete_60000 from 0:0:crm-resource-18072: lrm_invoke-lrmd-1280200218-1033
Jul 27 05:10:19 eol1 cibadmin: [18074]: info: Invoked: cibadmin -Ql -o resources 
Jul 27 05:10:19 eol1 cibadmin: [18075]: info: Invoked: cibadmin -Ql -o resources 
Jul 27 05:10:19 eol1 crm_resource: [18076]: info: Invoked: crm_resource -C -r p_mysqld_left -H eol2 

I know this kind of problem must have been answered a thousand times, but all I have found googling was to cleanup the resource to force it to be restarted which seems to not work for me or to "time out" the resource, which I'm afraid I don't know exactly how it is done.

Do you know how could I force the mysql resource to be restarted after I have fixed the problems that generated the problem?

Configuration follows:

[root at eol1 ~]# crm configure show
node eol1 \
	attributes standby="off"
node eol2 \
	attributes standby="off"
primitive p_apache_left ocf:heartbeat:apache \
	params configfile="/etc/httpd/conf/httpd.conf" \
	op monitor interval="1min" \
	op start interval="0" timeout="40s" \
	op stop interval="0" timeout="60s"
primitive p_drbd_left ocf:linbit:drbd \
	params drbd_resource="left" \
	op monitor interval="15s" \
	op start interval="0" timeout="240s" \
	op stop interval="0" timeout="100s"
primitive p_drbd_stacked ocf:linbit:drbd \
	params drbd_resource="stacked" \
	op monitor interval="15s" \
	op start interval="0" timeout="240s" \
	op stop interval="0" timeout="100s" \
	meta is-managed="true"
primitive p_fs_drbd_left ocf:heartbeat:Filesystem \
	params device="/dev/drbd1" directory="/opt/manhattan" fstype="ext3" \
	op start interval="0" timeout="60s" \
	op stop interval="0" timeout="60s"
primitive p_ip_stacked_left ocf:heartbeat:IPaddr2 \
	params ip="192.168.6.225" nic="eth0" \
	meta target-role="Started" is-managed="true"
primitive p_mysqld_left ocf:heartbeat:mysql \
	params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" test_table="monitor.boolean" test_user="p_mon" test_passwd="p_mon" \
	op monitor interval="15s" timeout="30s" \
	op start interval="0" timeout="120s" \
	op stop interval="0" timeout="120s" \
	meta target-role="Started"
primitive p_node_activate_left heartbeat:node-activate \
	op monitor interval="15s"
group g_drbd_left p_fs_drbd_left p_node_activate_left p_apache_left p_mysqld_left \
	meta target-role="Started"
ms ms_drbd_left p_drbd_left \
	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" is-managed="true" target-role="Master"
ms ms_drbd_stacked p_drbd_stacked \
	meta master-max="1" clone-max="1" clone-node-max="1" master-node-max="1" notify="true" target-role="Master"
location cli-prefer-p_ip_stacked_left p_ip_stacked_left \
	rule $id="cli-prefer-rule-p_ip_stacked_left" inf: #uname eq eol2
colocation c_drbd_stacked_on_ip_left inf: ms_drbd_stacked p_ip_stacked_left
colocation c_g_drbd_on_ms_drbd_stacked inf: g_drbd_left ms_drbd_stacked:Master
colocation c_ip_on_left_master inf: p_ip_stacked_left ms_drbd_left:Master
order o_drbd_left_before_stacked_left inf: ms_drbd_left:promote ms_drbd_stacked:start
order o_drbd_stacked_before_g_drbd_left inf: ms_drbd_stacked:promote g_drbd_left:start
order o_ip_before_stacked_left inf: p_ip_stacked_left ms_drbd_stacked:start
property $id="cib-bootstrap-options" \
	dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	stonith-enabled="false" \
	no-quorum-policy="ignore" \
	last-lrm-refresh="1279875820"
rsc_defaults $id="rsc-options" \
	resource-stickiness="100"

Thanks in advance.

Best regards,

Rafa

-- 
Rafael Porres Molina
Consultoría y Proyectos
Qindel Consulting S.L.

Móvil: (+34) 678650609
e-mail: rafael.porres at qindel.com
Dirección: c/Julián Camarillo 29, Edificio D2, 4ºIzda, 28037 Madrid, SPAIN / ESPAÑA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100727/18a9c9f6/attachment-0001.html>