[Pacemaker] crmd internal error during failover

Drapeau, Mathieu mathieu.drapeau at intel.com
Mon Mar 24 10:03:49 EDT 2014

Actually, I was wrong, the version used is 1.1.10.
So, how I can know which process is taking so long?


On 3/23/14, 7:35 PM, "Andrew Beekhof" <andrew at beekhof.net> wrote:

>On 21 Mar 2014, at 3:57 am, Drapeau, Mathieu <mathieu.drapeau at intel.com>
>> Hello,
>> From pacemaker 1.1.8-7 from EL6, crmd died unexpected generating this
>>logs during a failover:
>Please update to 1.1.10 from the EL6 update channels:
>> crmd[10419]:    error: crmd_node_update_complete: Node update 79
>>failed: Timer expired (-62)
>It looks like your hardware is overloaded and an operation that shouldn't
>have taken very long has timed out.
>> crmd[10419]:    error: do_log: FSA: Input I_ERROR from
>>crmd_node_update_complete() received in state S_IDLE
>> crmd[10419]:   notice: do_state_transition: State transition S_IDLE ->
>>origin=crmd_node_update_complete ]
>> crmd[10419]:  warning: do_recover: Fast-tracking shutdown in response
>>to errors
>> crmd[10419]:  warning: do_election_vote: Not voting in election, we're
>>in state S_RECOVERY
>> crmd[10419]:    error: do_log: FSA: Input I_TERMINATE from do_recover()
>>received in state S_RECOVERY
>> crmd[10419]:   notice: lrm_state_verify_stopped: Stopped 0 recurring
>>operations at shutdown (2 ops remaining)
>> crmd[10419]:   notice: lrm_state_verify_stopped: Recurring action
>>testfs-MDT0000_6cda68:21 (testfs-MDT0000_6cda68_monitor_5000) incomplete
>>at shutdown
>> crmd[10419]:   notice: lrm_state_verify_stopped: Recurring action
>>MGS_f055b7:30 (MGS_f055b7_monitor_5000) incomplete at shutdown
>> crmd[10419]:    error: lrm_state_verify_stopped: 3 resources were
>>active at shutdown.
>> crmd[10419]:   notice: do_lrm_control: Disconnected from the LRM
>> crmd[10419]:   notice: terminate_cs_connection: Disconnecting from
>> corosync[10370]:   [pcmk  ] info: pcmk_ipc_exit: Client crmd
>>(conn=0x2589f40, async-conn=0x2589f40) left
>> crmd[10419]:    error: crmd_fast_exit: Could not recover from internal
>> pacemakerd[10408]:    error: pcmk_child_exit: Child process crmd
>>(10419) exited: Generic Pacemaker error (201)
>> pacemakerd[10408]:   notice: pcmk_process_exit: Respawning failed child
>>process: crmd
>> What could have happened and how to avoid crmd to die?
>> Thanks,
>> Mat
