[Pacemaker] Time out issue while stopping resource in pacemaker

Lax lkota at cisco.com
Fri Oct 10 01:12:13 UTC 2014


Hi All,

I ran into a time out issue while failing over from master to the peer
server and I have a 2 node setup with 2 resources. Though it was working all
along, this was the first time this issue is seen for me.

It fail with following error 'error: process_lrm_event: LRM operation
resourceB_stop_0 (40) Timed Out (timeout=20000ms)'.



Here is the complete log snippet from pacemaker, appreciate your help on this.


Oct  9 14:57:38 server1 cib[368]:   notice: cib:diff: Diff: +++ 0.3.1
4e9bfa03cf2fef61843c18e127044d81
Oct  9 14:57:38 server1 cib[368]:   notice: cib:diff: -- <cib
admin_epoch="0" epoch="2" num_updates="8" />
Oct  9 14:57:38 server1 crmd[373]:   notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Oct  9 14:57:38 server1 cib[368]:   notice: cib:diff: ++        
<instance_attributes id="nodes-server1" >
Oct  9 14:57:38 server1 cib[368]:   notice: cib:diff: ++           <nvpair
id="nodes-server1-standby" name="standby" value="true" />
Oct  9 14:57:38 server1 cib[368]:   notice: cib:diff: ++        
</instance_attributes>
Oct  9 14:57:38 server1 pengine[372]:   notice: unpack_config: On loss of
CCM Quorum: Ignore
Oct  9 14:57:38 server1 pengine[372]:   notice: LogActions: Move   
ClusterIP#011(Started server1 -> 172.28.0.64)
Oct  9 14:57:38 server1 pengine[372]:   notice: LogActions: Move   
resourceB#011(Started server1 -> 172.28.0.64)
Oct  9 14:57:38 server1 pengine[372]:   notice: process_pe_message:
Calculated Transition 11: /var/lib/pacemaker/pengine/pe-input-1710.bz2
Oct  9 14:57:58 server1 lrmd[370]:  warning: child_timeout_callback:
resourceB_stop_0 process (PID 17327) timed out
Oct  9 14:57:58 server1 lrmd[370]:  warning: operation_finished:
resourceB_stop_0:17327 - timed out after 20000ms
Oct  9 14:57:58 server1 lrmd[370]:   notice: operation_finished:
resourceB_stop_0:17327 [   % Total    % Received % Xferd  Average Speed  
Time    Time     Time  Current ]
Oct  9 14:57:58 server1 lrmd[370]:   notice: operation_finished:
resourceB_stop_0:17327 [                                  Dload  Upload  
Total   Spent    Left  Speed ]
Oct  9 14:57:58 server1 lrmd[370]:   notice: operation_finished:
resourceB_stop_0:17327 [ #015  0     0    0     0    0     0      0      0
--:--:-- --:--:-- --:--:--     0#015  0     0    0     0    0     0      0 
    0 --:--:--  0:00:01 --:--:--     0#015  0     0    0     0    0     0  
   0      0 --:--:--  0:00:02 --:--:--     0#015  0     0    0     0    0  
  0      0      0 --:--:--  0:00:03 --:--:--     0#015  0     0    0     0 
  0     0      0      0 --:--:--  0:00:04 --:--:--     0#015  0     0    0 
   0    0     0      0      0 --:--:--  0:00:05 -
Oct  9 14:57:58 server1 crmd[373]:    error: process_lrm_event: LRM
operation resourceB_stop_0 (40) Timed Out (timeout=20000ms)
Oct  9 14:57:58 server1 crmd[373]:  warning: status_from_rc: Action 10
(resourceB_stop_0) on server1 failed (target: 0 vs. rc: 1): Error
Oct  9 14:57:58 server1 crmd[373]:  warning: update_failcount: Updating
failcount for resourceB on server1 after failed stop: rc=1 (update=INFINITY,
time=1412891878)
Oct  9 14:57:58 server1 attrd[371]:   notice: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-resourceB (INFINITY)
Oct  9 14:57:58 server1 crmd[373]:  warning: update_failcount: Updating
failcount for resourceB on server1 after failed stop: rc=1 (update=INFINITY,
time=1412891878)
Oct  9 14:57:58 server1 crmd[373]:   notice: run_graph: Transition 11
(Complete=2, Pending=0, Fired=0, Skipped=9, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-1710.bz2): Stopped
Oct  9 14:57:58 server1 attrd[371]:   notice: attrd_perform_update: Sent
update 11: fail-count-resourceB=INFINITY


Thanks
Lax





More information about the Pacemaker mailing list