[ClusterLabs] Resource start and stop run into timeout

joerg at die-kilians.eu joerg at die-kilians.eu
Fri Jun 5 07:06:34 EDT 2020


Dear all,

I have an issue on my cluster running Pacemaker 2.0.3.
 From time to time my resources fail to start or fail to stop. The 
cluster runs into an timeout for the command.
Between the attempt to start the resource and the timeout is only 1 
second (between Jun 04 13:48:37 and Jun 04 13:48:38).

The resource is defined like this:
	Resource: nmb (class=systemd type=nmb)
	  Operations: monitor interval=60s timeout=100s 
(nmb-monitor-interval-60s)
         	      start interval=0s timeout=600s (nmb-start-interval-0s)
	              stop interval=0s timeout=600s (nmb-stop-interval-0s)

So the timeout should be 600 seconds. The cluster states that the 
timeout has reached (timeout (elapsed=647561ms, remaining=-47561ms)), 
but obviously only 1 second has passed.

Do you have any hint for me ?

Below is a log from that time.

Thanks

Log:
Jun 04 13:48:37 node1 pacemaker-controld  [1162] (process_graph_event)   
      info: Transition 641 action 31 (smb_stop_0 on node1) confirmed: ok 
| rc=0 call-id=896
Jun 04 13:48:37 node1 pacemaker-controld  [1162] (te_rsc_command)     
notice: Initiating stop operation nmb_stop_0 locally on node1 | action 
29
Jun 04 13:48:37 node1 pacemaker-execd     [1159] 
(cancel_recurring_action)    info: Cancelling systemd operation 
nmb_status_60000
Jun 04 13:48:37 node1 pacemaker-based     [1157] (cib_process_request)   
      info: Completed cib_modify operation for section status: OK (rc=0, 
origin=node1/crmd/1383, version=1.141.45)
Jun 04 13:48:37 node1 pacemaker-controld  [1162] (do_lrm_rsc_op)      
info: Performing key=29:641:0:dbf25643-c4e8-4374-8aec-3c9e3ea3d397 
op=nmb_stop_0
Jun 04 13:48:37 node1 pacemaker-execd     [1159] (log_execute)        
info: executing - rsc:nmb action:stop call_id:909
Jun 04 13:48:37 node1 pacemaker-controld  [1162] (process_lrm_event)  
info: Result of monitor operation for nmb on node1: Cancelled | call=329 
key=nmb_monitor_60000 confirmed=true
Jun 04 13:48:37 node1 pacemaker-controld  [1162] (exec_alert_list)    
info: Sending resource alert via alert to (null)
Jun 04 13:48:37 node1 pacemaker-based     [1157] (cib_process_request)   
      info: Forwarding cib_modify operation for section status to all 
(origin=local/crmd/1384)
Jun 04 13:48:37 node1 pacemaker-execd     [1159] 
(process_lrmd_alert_exec)    info: Executing alert alert for 
8f105ac8-de01-4d3b-9881-10963acd7002
Jun 04 13:48:37 node1 pacemaker-based     [1157] (cib_perform_op)     
info: Diff: --- 1.141.45 2
Jun 04 13:48:37 node1 pacemaker-based     [1157] (cib_perform_op)     
info: Diff: +++ 1.141.46 (null)
Jun 04 13:48:37 node1 pacemaker-based     [1157] (cib_perform_op)     
info: +  /cib:  @num_updates=46
Jun 04 13:48:37 node1 pacemaker-based     [1157] (cib_perform_op)     
info: +  
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='nmb']/lrm_rsc_op[@id='nmb_last_0']: 
  @operation_key=nmb_stop_0, @operation=stop, 
@crm-debug-origin=do_update_resource, 
@transition-key=29:641:0:dbf25643-c4e8-4374-8aec-3c9e3ea3d397, 
@transition-magic=-1:193;29:641:0:dbf25643-c4e8-4374-8aec-3c9e3ea3d397, 
@call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1591271317, 
@last-run=1591271317, @e
Jun 04 13:48:37 node1 pacemaker-based     [1157] (cib_process_request)   
      info: Completed cib_modify operation for section status: OK (rc=0, 
origin=node1/crmd/1384, version=1.141.46)
Jun 04 13:48:38 node1 pacemaker-execd     [1159] (systemd_exec_result)   
      info: Call to stop passed: /org/freedesktop/systemd1/job/189965
Jun 04 13:48:38 node1 pacemaker-execd     [1159] (action_complete)    
notice: Giving up on nmb stop (rc=0): timeout (elapsed=647561ms, 
remaining=-47561ms)
Jun 04 13:48:38 node1 pacemaker-based     [1157] (cib_process_request)   
      info: Forwarding cib_modify operation for section status to all 
(origin=local/crmd/1385)


More information about the Users mailing list