[Pacemaker] An internal error occurred in crmd

Thu Oct 31 00:32:26 EDT 2013

Hi Andrew,

2013/10/31 Andrew Beekhof <andrew at beekhof.net>:
> I think this should be fixed by:
>    https://github.com/beekhof/pacemaker/commit/ea7991f

I confirmed that it was fixed.
Many thanks,

>
> The underlying issue though, is that the lrmd command timed out, which _should_ have been fixed by:
>    https://github.com/beekhof/pacemaker/commit/d65b270
>
> What are you doing to this poor cluster? :)

I intend to test a function of migration of pacemaker-1.1.

Kazunori INOUE

>
> On 21 Oct 2013, at 3:59 pm, Kazunori INOUE <kazunori.inoue3 at gmail.com> wrote:
>
>> Hi,
>>
>> I'm using pacemaker-1.1 (b6d42ed. the latest devel).
>>
>> After having started corosync and pacemaker with three nodes,
>> I loaded configuration.
>> Then internal error occurred in crmd and was exited.
>>
>> $ crm configure load update 3vm+2stonith.cli
>> $ for i in n{6..8};do ssh $i 'grep error: /var/log/ha-log';done
>> Oct 21 11:19:43 bl460g1n6 pengine[7684]:    error: unpack_resources:
>> Resource start-up disabled since no STONITH resources have been
>> defined
>> Oct 21 11:19:43 bl460g1n6 pengine[7684]:    error: unpack_resources:
>> Either configure some or disable STONITH with the stonith-enabled
>> option
>> Oct 21 11:19:43 bl460g1n6 pengine[7684]:    error: unpack_resources:
>> NOTE: Clusters with shared data need STONITH to ensure data integrity
>> Oct 21 11:20:51 bl460g1n6 crmd[7685]:    error: crm_element_value:
>> Couldn't find lrmd_callid in NULL
>> Oct 21 11:20:51 bl460g1n6 crmd[7685]:    error: crm_abort:
>> crm_element_value: Triggered assert at xml.c:3336 : data != NULL
>> Oct 21 11:20:51 bl460g1n6 crmd[7685]:    error: crm_element_value:
>> Couldn't find lrmd_rc in NULL
>> Oct 21 11:20:51 bl460g1n6 crmd[7685]:    error: crm_abort:
>> crm_element_value: Triggered assert at xml.c:3336 : data != NULL
>> Oct 21 11:20:53 bl460g1n6 crmd[7685]:    error:
>> internal_ipc_get_reply: Discarding old reply 90 (need 91)
>>
>> Oct 21 11:20:51 bl460g1n7 crmd[12487]:    error: lrmd_send_command:
>> Couldn't perform lrmd_rsc_info operation (timeout=30000): -11:
>> Connection timed out (110)
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error: lrmd_send_command:
>> Couldn't perform lrmd_rsc_register operation (timeout=0): -114:
>> Connection timed out (110)
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error: lrmd_send_command:
>> Couldn't perform lrmd_rsc_info operation (timeout=30000): -114:
>> Connection timed out (110)
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error: get_lrm_resource:
>> Could not add resource prmStonith6-2 to LRM
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error: do_lrm_invoke:
>> Invalid resource definition
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error: do_log: FSA: Input
>> I_TERMINATE from do_recover() received in state S_RECOVERY
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error:
>> lrm_state_verify_stopped: 4 pending LRM operations at shutdown
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error:
>> lrm_state_verify_stopped: Pending action: prmVM3:13 (prmVM3_monitor_0)
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error:
>> lrm_state_verify_stopped: Pending action: prmVM2:9 (prmVM2_monitor_0)
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error:
>> lrm_state_verify_stopped: Pending action: prmVM1:5 (prmVM1_monitor_0)
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error:
>> lrm_state_verify_stopped: Pending action: prmStonith6-1:17
>> (prmStonith6-1_monitor_0)
>> Oct 21 11:20:52 bl460g1n7 crmd[12487]:    error: crmd_fast_exit: Could
>> not recover from internal error
>> Oct 21 11:20:52 bl460g1n7 pacemakerd[12477]:    error:
>> pcmk_child_exit: Child process crmd (12487) exited: Generic Pacemaker
>> error (201)
>>
>> Oct 21 11:20:51 bl460g1n8 crmd[1600]:    error: lrmd_send_command:
>> Couldn't perform lrmd_rsc_info operation (timeout=30000): -11:
>> Connection timed out (110)
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error: lrmd_send_command:
>> Couldn't perform lrmd_rsc_register operation (timeout=0): -114:
>> Connection timed out (110)
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error: lrmd_send_command:
>> Couldn't perform lrmd_rsc_info operation (timeout=30000): -114:
>> Connection timed out (110)
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error: get_lrm_resource:
>> Could not add resource prmStonith6-2 to LRM
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error: do_lrm_invoke: Invalid
>> resource definition
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error: do_log: FSA: Input
>> I_TERMINATE from do_recover() received in state S_RECOVERY
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error:
>> lrm_state_verify_stopped: 4 pending LRM operations at shutdown
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error:
>> lrm_state_verify_stopped: Pending action: prmVM3:13 (prmVM3_monitor_0)
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error:
>> lrm_state_verify_stopped: Pending action: prmVM2:9 (prmVM2_monitor_0)
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error:
>> lrm_state_verify_stopped: Pending action: prmVM1:5 (prmVM1_monitor_0)
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error:
>> lrm_state_verify_stopped: Pending action: prmStonith6-1:17
>> (prmStonith6-1_monitor_0)
>> Oct 21 11:20:52 bl460g1n8 crmd[1600]:    error: crmd_fast_exit: Could
>> not recover from internal error
>> Oct 21 11:20:52 bl460g1n8 pacemakerd[1591]:    error: pcmk_child_exit:
>> Child process crmd (1600) exited: Generic Pacemaker error (201)
>>
>> Best Regards,
>> Kazunori INOUE
>> <crmd_internal_error.tar.bz2>_______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org