[Pacemaker] crm resource restart fails to restart the service

Dan Frincu dfrincu at streamwide.ro
Wed Nov 17 10:00:09 EST 2010


Hi,

Vadym Chepkov wrote:
> On Wed, Nov 17, 2010 at 8:01 AM, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
>
>   
>>> RA doesn't support restart action? Most luckily you get OCF_ERR_UNIMPLEMENTED in the log
>>>       
>> It's actually a resource stop followed by start. It says so in
>> the help too. Perhaps the start precludes the stop action. The
>> logs should give a hint. We need a sleep in between.
>>
>>     
>
> In this case this command is not working at all, because I tried in
> the past for many resources and it never worked, so I just assumed it
> has to be implemented by RA.
>
> To test it right now I issued a command
> # crm resource restart xen_vbuild
>
> where xen_vbuild is a Xen VM and no results whatsoever.
>
> Here is output of the log
>
> Nov 17 13:04:13 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:05:14 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:06:14 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:07:15 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> +       <primitive id="xen_vbuild" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> +         <meta_attributes id="xen_vbuild-meta_attributes"
> __crm_diff_marker__="added:top" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> +           <nvpair id="xen_vbuild-meta_attributes-target-role"
> name="target-role" value="Stopped" />
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> -       <primitive id="xen_vbuild" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> -         <meta_attributes id="xen_vbuild-meta_attributes" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> -           <nvpair value="Stopped"
> id="xen_vbuild-meta_attributes-target-role" />
> Nov 17 13:07:44 xen-11 pengine: [22958]: debug: unpack_lrm_rsc_state:
> xen_vbuild: Overwriting calculated next role Unknown with requested
> next role Stopped
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> +       <primitive id="xen_vbuild" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> +         <meta_attributes id="xen_vbuild-meta_attributes" >
> Nov 17 13:07:44 xen-11 cib: [4294]: info: log_data_element: cib:diff:
> +           <nvpair value="Started"
> id="xen_vbuild-meta_attributes-target-role" />
> Nov 17 13:07:45 xen-11 pengine: [22958]: debug: unpack_lrm_rsc_state:
> xen_vbuild: Overwriting calculated next role Unknown with requested
> next role Stopped
> Nov 17 13:07:45 xen-11 pengine: [22958]: notice: native_print:
> xen_vbuild	(ocf::heartbeat:Xen):	Started xen-11
> Nov 17 13:07:45 xen-11 pengine: [22958]: debug: native_assign_node:
> All nodes for resource xen_vbuild are unavailable, unclean or shutting
> down (xen-11: 1, -1000000)
> Nov 17 13:07:45 xen-11 pengine: [22958]: debug: native_assign_node:
> Could not allocate a node for xen_vbuild
> Nov 17 13:07:45 xen-11 pengine: [22958]: info: native_color: Resource
> xen_vbuild cannot run anywhere
> Nov 17 13:07:45 xen-11 pengine: [22958]: notice: LogActions: Stop
> resource xen_vbuild	(xen-11)
> Nov 17 13:07:46 xen-11 pengine: [22958]: notice: native_print:
> xen_vbuild	(ocf::heartbeat:Xen):	Started xen-11
> Nov 17 13:07:46 xen-11 pengine: [22958]: debug: native_assign_node:
> Assigning xen-11 to xen_vbuild
> Nov 17 13:07:46 xen-11 pengine: [22958]: notice: LogActions: Leave
> resource xen_vbuild	(Started xen-11)
> Nov 17 13:08:15 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:09:15 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:10:16 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:11:16 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:12:17 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:13:17 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:14:17 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:15:18 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:16:18 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:17:19 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:17:47 xen-11 pengine: [22958]: notice: native_print:
> xen_vbuild	(ocf::heartbeat:Xen):	Started xen-11
> Nov 17 13:17:47 xen-11 pengine: [22958]: debug: native_assign_node:
> Assigning xen-11 to xen_vbuild
> Nov 17 13:17:47 xen-11 pengine: [22958]: notice: LogActions: Leave
> resource xen_vbuild	(Started xen-11)
> Nov 17 13:18:19 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:19:19 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
> Nov 17 13:20:20 xen-11 lrmd: [4295]: debug: rsc:xen_vbuild:101: monitor
>
> but VM never stopped:
>
>
> [root at xen-11 ~]# xm list|grep vbuild
> vbuild                                     3      511     2 -b----    352.4
>
>
> still ID 3 as it was before
>   
In my case the custom OCF RA, works, after some tweaks, now I'm stuck 
with the mysql RA, I think this is the issue:

/usr/lib/ocf/resource.d/heartbeat# ./mysql stop
./mysql: line 523: (/1000)-5: syntax error: operand expected (error 
token is "/1000)-5")

First I thought it was because I set the monitor, start and stop 
timeouts to other values than the default, but even after setting the 
defaults, same thing.

primitive mysqld ocf:heartbeat:mysql \
        params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" 
enable_creation="0" datadir="/mysql/database" user="root" 
test_user="monitor" test_passwd="monitor" test_table="cluster.monitor" \
        op monitor interval="10s" timeout="30s" \
        op start interval="0s" timeout="120s" \
        op stop interval="0s" timeout="120s" \
        meta target-role="Started"

Same thing goes for httpd RA:

root at cluster1:/usr/lib/ocf/resource.d/heartbeat# pgrep httpd
11985
11997
11998
11999
12000
12001
12002
12003
12004
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# crm resource restart httpd
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# pgrep httpd
11985
11997
11998
11999
12000
12001
12002
12003
12004
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# crm
crm(live)# resource
crm(live)resource# restart httpd
crm(live)resource# exit
bye
root at cluster1:/usr/lib/ocf/resource.d/heartbeat# pgrep httpd
11985
11997
11998
11999
12000
12001
12002
12003
12004

And the logs

Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
- <cib admin_epoch="0" epoch="57" num_updates="4" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-   <configuration >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-     <resources >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-       <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-         <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-           <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-             <nvpair value="Started" 
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-           </meta_attributes>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-         </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-       </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-     </resources>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: abort_transition_graph: 
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-   </configuration>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: need_abort: Aborting on 
change to admin_epoch
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
- </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+ <cib admin_epoch="0" epoch="58" num_updates="1" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+   <configuration >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+     <resources >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: All 2 
cluster nodes are eligible to run resources.
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+       <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+         <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+           <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke: Query 17431: 
Requesting the current CIB: S_POLICY_ENGINE
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+             <nvpair value="Stopped" 
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+           </meta_attributes>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+         </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+       </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+     </resources>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+   </configuration>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+ </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: cib_process_request: 
Operation complete: op cib_replace for section resources 
(origin=local/cibadmin/2, version=0.58.1): ok (rc=0)
Nov 17 07:51:11 cluster1 cib: [5026]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-49.raw
Nov 17 07:51:11 cluster1 cib: [5026]: info: write_cib_contents: Wrote 
version 0.58.0 of the CIB to disk (digest: 3027e9f4990e368ce7536b46a7f8b46b)
Nov 17 07:51:11 cluster1 cib: [5026]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.cNSkQf (digest: 
/var/lib/heartbeat/crm/cib.kv1k0m)
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke_callback: 
Invoking the PE: query=17431, ref=pe_calc-dc-1290005471-17354, seq=116, 
quorate=1
Nov 17 07:51:11 cluster1 pengine: [25687]: notice: unpack_config: On 
loss of CCM Quorum: Ignore
Nov 17 07:51:11 cluster1 pengine: [25687]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Nov 17 07:51:11 cluster1 pengine: [25687]: info: 
determine_online_status: Node cluster1 is online
Nov 17 07:51:11 cluster1 pengine: [25687]: info: 
determine_online_status: Node cluster2 is online
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
- <cib admin_epoch="0" epoch="58" num_updates="1" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-   <configuration >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-     <resources >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-       <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-         <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-           <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-             <nvpair value="Stopped" 
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-           </meta_attributes>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-         </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-       </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-     </resources>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
-   </configuration>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
- </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+ <cib admin_epoch="0" epoch="59" num_updates="1" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+   <configuration >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+     <resources >
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+       <group id="all" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+         <primitive id="httpd" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+           <meta_attributes id="httpd-meta_attributes" >
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+             <nvpair value="Started" 
id="httpd-meta_attributes-target-role" />
Nov 17 07:51:11 cluster1 crmd: [25688]: info: unpack_graph: Unpacked 
transition 17187: 6 actions in 6 synapses
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+           </meta_attributes>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_te_invoke: Processing 
graph 17187 (ref=pe_calc-dc-1290005471-17354) derived from 
/var/lib/pengine/pe-input-27464.bz2
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+         </primitive>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+       </group>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+     </resources>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+   </configuration>
Nov 17 07:51:11 cluster1 cib: [25684]: info: log_data_element: cib:diff: 
+ </cib>
Nov 17 07:51:11 cluster1 cib: [25684]: info: cib_process_request: 
Operation complete: op cib_replace for section resources 
(origin=local/cibadmin/2, version=0.59.1): ok (rc=0)
Nov 17 07:51:11 cluster1 crmd: [25688]: info: abort_transition_graph: 
need_abort:59 - Triggered transition abort (complete=0) : Non-status change
Nov 17 07:51:11 cluster1 crmd: [25688]: info: update_abort_priority: 
Abort priority upgraded from 0 to 1000000
Nov 17 07:51:11 cluster1 crmd: [25688]: info: update_abort_priority: 
Abort action done superceeded by restart
Nov 17 07:51:11 cluster1 crmd: [25688]: info: need_abort: Aborting on 
change to admin_epoch
Nov 17 07:51:11 cluster1 crmd: [25688]: info: run_graph: 
====================================================
Nov 17 07:51:11 cluster1 crmd: [25688]: notice: run_graph: Transition 
17187 (Complete=0, Pending=0, Fired=0, Skipped=6, Incomplete=0, 
Source=/var/lib/pengine/pe-input-27464.bz2): Stopped
Nov 17 07:51:11 cluster1 crmd: [25688]: info: te_graph_trigger: 
Transition 17187 is now complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: All 2 
cluster nodes are eligible to run resources.
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke: Query 17432: 
Requesting the current CIB: S_POLICY_ENGINE
Nov 17 07:51:11 cluster1 pengine: [25687]: info: process_pe_message: 
Transition 17187: PEngine Input stored in: 
/var/lib/pengine/pe-input-27464.bz2
Nov 17 07:51:11 cluster1 cib: [5028]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-50.raw
Nov 17 07:51:11 cluster1 cib: [5028]: info: write_cib_contents: Wrote 
version 0.59.0 of the CIB to disk (digest: 0363c0ff8bed95a2195288798a9ece7b)
Nov 17 07:51:11 cluster1 cib: [5028]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.pqsr2f (digest: 
/var/lib/heartbeat/crm/cib.N4eyon)
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_pe_invoke_callback: 
Invoking the PE: query=17432, ref=pe_calc-dc-1290005471-17355, seq=116, 
quorate=1
Nov 17 07:51:11 cluster1 pengine: [25687]: notice: unpack_config: On 
loss of CCM Quorum: Ignore
Nov 17 07:51:11 cluster1 pengine: [25687]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Nov 17 07:51:11 cluster1 pengine: [25687]: info: 
determine_online_status: Node cluster1 is online
Nov 17 07:51:11 cluster1 pengine: [25687]: info: 
determine_online_status: Node cluster2 is online
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: unpack_graph: Unpacked 
transition 17188: 0 actions in 0 synapses
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_te_invoke: Processing 
graph 17188 (ref=pe_calc-dc-1290005471-17355) derived from 
/var/lib/pengine/pe-input-27465.bz2
Nov 17 07:51:11 cluster1 crmd: [25688]: info: run_graph: 
====================================================
Nov 17 07:51:11 cluster1 crmd: [25688]: notice: run_graph: Transition 
17188 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-27465.bz2): Complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: te_graph_trigger: 
Transition 17188 is now complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: notify_crmd: Transition 
17188 status: done - <null>
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: 
Starting PEngine Recheck Timer
Nov 17 07:51:11 cluster1 pengine: [25687]: info: process_pe_message: 
Transition 17188: PEngine Input stored in: 
/var/lib/pengine/pe-input-27465.bz2

Regards,

Dan
> Vadym
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>   

-- 
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101117/392ddff5/attachment-0001.html>


More information about the Pacemaker mailing list