[ClusterLabs] need some help with failing resources

Mon Dec 5 16:17:31 CET 2016

On 12/03/2016 05:19 AM, Darko Gavrilovic wrote:
> Here is the output for that resource.. edited
> 
> primitive svc-mysql ocf:heartbeat:mysql \
>         params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
> datadir="/var/lib/mysql" user="mysql" group="mysql"
> log="/var/log/mysqld.log" pid="/var/run/mysqld/mysqld.pid"
> socket="/var/lib/mysql/mysql.sock" test_table="***" test_user="***"
> test_passwd="****" \
>         op monitor interval="30s" timeout="60s" OCF_CHECK_LEVEL="5" \
>         op start interval="0" timeout="120s" \
>         op stop interval="0" timeout="120s" \
>         meta target-role="Started" migration-threshold="2"
> 
> ...skipping
> order mysql-before-httpd inf: svc-mysql:start svc-httpd:start
> order mysql-before-ssh inf: svc-mysql:start svc-ssh:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.6-f709c638237cdff7556cb6ab615f32826c0f8c06" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         last-lrm-refresh="1480762389" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="true" \
>         ms-drbd0="Master"
> 
> 
> dg
> 
> 
> On 12/3/2016 1:25 AM, Kostiantyn Ponomarenko wrote:
>> I assume that you are using crmsh.
>> If so, I suggest to post an output from "crm configure show" command
>> here.
>>
>> Thank you,
>> Kostia
>>
>> On Sat, Dec 3, 2016 at 5:54 AM, Darko Gavrilovic
>> <darko at chass.utoronto.ca <mailto:darko at chass.utoronto.ca>> wrote:
>>
>>     Hello, I have a two node cluster running that seems to be failing to
>>     start resources.
>>
>>      Resource Group: services
>>          svc-mysql  (ocf::heartbeat:mysql) Stopped
>>          svc-httpd  (ocf::heartbeat:apache) Stopped
>>          svc-ssh    (lsb:sshd-virt) Stopped
>>          svc-tomcat6        (lsb:tomcat6) Stopped
>>          svc-plone  (lsb:plone) Stopped
>>          svc-bacula (lsb:bacula-fd-virt) Stopped
>>
>>     When I run crm resource start services the service group does not
>> start.
>>
>>     I also tried starting the first resource in the group.
>>     crm resource start svc-mysql
>>
>>     It does not start either.
>>
>>     The error I am seeing is:
>>     Dec  2 21:59:43  pengine: [25829]: WARN: native_color: Resource
>>     svc-mysql cannot run anywhere
>>     Dec  2 22:00:26  pengine: [25829]: WARN: native_color: Resource
>>     svc-mysql cannot run anywhere

The most common reasons for the above message are:

* Location or order constraints don't leave any place for the resource
to run

* The resource has failed the maximum number of times on all nodes.
(Does "crm_mon" show any failures?)

>>
>>     4b4f-a239-8a10dad40587, cib=0.3857.2) : Resource op removal
>>     Dec  2 21:59:32 server1 crmd: [25830]: info: te_rsc_command:
>>     Initiating action 55: monitor svc-mysql_monitor_0 on
>>     kurt.chass.utoronto.ca <http://kurt.chass.utoronto.ca> (local)
>>     Dec  2 21:59:32 server1 crmd: [25830]: info: do_lrm_rsc_op:
>>     Performing key=55:14:7:aee06ee3-9576-4b4f-a239-8a10dad40587
>>     op=svc-mysql_monitor_0 )
>>     Dec  2 21:59:32 server1 crmd: [25830]: info: process_lrm_event: LRM
>>     operation svc-mysql_monitor_0 (call=163, rc=7, cib-update=249,
>>     confirmed=true) not running
>>     Dec  2 21:59:32 server1 crmd: [25830]: info: match_graph_event:
>>     Action svc-mysql_monitor_0 (55) confirmed on kurt.chass.utoronto.ca
>>     <http://kurt.chass.utoronto.ca> (rc=0)
>>     Dec  2 21:59:32 server1 crmd: [25830]: info: abort_transition_graph:
>>     te_update_diff:267 - Triggered transition abort (complete=1,
>>     tag=lrm_rsc_op, id=svc-mysql_monitor_0,
>>     magic=0:7;71:5:7:aee06ee3-9576-4b4f-a239-8a10dad40587, cib=0.3858.3)
>>     : Resource op removal
>>     Dec  2 21:59:33 server1 crmd: [25830]: info: te_rsc_command:
>>     Initiating action 56: monitor svc-mysql_monitor_0 on server2
>>     Dec  2 21:59:33 server1 crmd: [25830]: WARN: status_from_rc: Action
>>     56 (svc-mysql_monitor_0) on server2 failed (target: 7 vs. rc: 0):
>> Error

The above error indicates that mysql is running on server2 but the
cluster didn't start it there. (The "_monitor_0" is called a "probe";
it's used to check the status of the service before the cluster starts
it. The "target: 7" means it expects the service to be stopped. The "rc:
0" means the service is actually running.)

Make sure you're not starting mysql at boot or by any other means than
the cluster.

>>     Dec  2 21:59:33 server1 crmd: [25830]: info: abort_transition_graph:
>>     match_graph_event:272 - Triggered transition abort (complete=0,
>>     tag=lrm_rsc_op, id=svc-mysql_monitor_0,
>>     magic=0:0;56:15:7:aee06ee3-9576-4b4f-a239-8a10dad40587,
>>     cib=0.3859.2) : Event failed
>>     Dec  2 21:59:33 server1 crmd: [25830]: info: match_graph_event:
>>     Action svc-mysql_monitor_0 (56) confirmed on server2 (rc=4)
>>     Dec  2 21:59:33 server1 crmd: [25830]: info: te_rsc_command:
>>     Initiating action 187: stop svc-mysql_stop_0 on server2
>>     Dec  2 21:59:35 server1 crmd: [25830]: info: match_graph_event:
>>     Action svc-mysql_stop_0 (187) confirmed on server2 (rc=0)
>>     Dec  2 22:10:20 server1 crmd: [19708]: info: do_lrm_rsc_op:
>>     Performing key=101:1:7:6e477ca6-4ffe-4e89-82c2-c6149d528128
>>     op=svc-mysql_monitor_0 )
>>     Dec  2 22:10:20 server1 crmd: [19708]: info: process_lrm_event: LRM
>>     operation svc-mysql_monitor_0 (call=51, rc=7, cib-update=42,
>>     confirmed=true) not running
>>     Dec  2 22:12:24 server1 crmd: [19708]: info: te_rsc_command:
>>     Initiating action 102: monitor svc-mysql_monitor_0 on server2
>>     Dec  2 22:12:24 server1 crmd: [19708]: info: match_graph_event:
>>     Action svc-mysql_monitor_0 (102) confirmed on server2 (rc=0)
>>
>>
>>     Any advice on how to tackle this?
>>
>>     dg
>>