[Pacemaker] Pacemaker core dumps

Xavier Lashmar xlashmar at uottawa.ca
Tue May 7 12:42:57 UTC 2013


Hey Andrew, that is great news.

Do you know when new RPMs with the updated source might be available?  We are managing production servers and would rather continue using  package management to update them.  Otherwise we shall recompile if there is no alternative.

Thanks very much for your help.

Xavier Lashmar
Analyste de Systèmes | Systems Analyst
Service étudiants, service de l'informatique et des communications/Student services, computing and communications services.
1 Nicholas Street (810)
Ottawa ON K1N 7B7
Tél. | Tel. 613-562-5800 (2120)
 



-----Original Message-----
From: Andrew Beekhof [mailto:andrew at beekhof.net] 
Sent: Monday, May 6, 2013 12:46 AM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Pacemaker core dumps

It was tripping over the '\033' escape character in lrmd_rsc_output ("tomcat6 (pid 3199) is running...\033[60G[\033[0;32m  OK \033[0;39m]...")

I'll commit the following patch shortly, thanks for reporting this and following up!


diff --git a/lib/common/xml.c b/lib/common/xml.c index b6df79f..7585c46 100644
--- a/lib/common/xml.c
+++ b/lib/common/xml.c
@@ -1011,6 +1011,15 @@ crm_xml_escape(const char *text)
                 copy = crm_xml_escape_shuffle(copy, index, &length, "&");
                 changes++;
                 break;
+            default:
+                /* Check for and replace non-printing characters with underscores */
+                if(copy[index] == 0) {
+                    break;
+                } else if(copy[index] < ' ') {
+                    copy = crm_xml_escape_shuffle(copy, index, &length, "_");
+                } else if(copy[index] > '~') {
+                    copy = crm_xml_escape_shuffle(copy, index, &length, "_");
+                }
         }
     }
 


On 03/05/2013, at 11:13 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:

> Here it is:
> 
> (gdb) bt
> #0  0x00007f81896ac8a5 in raise (sig=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00007f81896ae085 in abort () at abort.c:92
> #2  0x00007f818bb8a56b in crm_abort (file=0x7f818bba9d58 "xml.c", function=0x7f818bbab6b4 "string2xml", line=650, 
>    assert_condition=0x7f818bbaa01a "String parsing error", 
> do_core=<value optimized out>, do_fork=<value optimized out>) at 
> utils.c:1073
> #3  0x00007f818bb933af in string2xml (
>    input=0x1e745f8 "<lrmd_notify 
> lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" 
> lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" 
> lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" 
> lrmd_rsc_del"...) at xml.c:650
> #4  0x00007f818b76a2fc in lrmd_ipc_dispatch (buffer=<value optimized 
> out>, length=<value optimized out>, userdata=0x1e72910) at 
> lrmd_client.c:310
> #5  0x00007f818bba2e90 in mainloop_gio_callback (gio=<value optimized 
> out>, condition=G_IO_IN, data=0x1e73be0) at mainloop.c:585
> #6  0x00007f8188fbbf0e in g_main_dispatch (context=0x1d4f120) at 
> gmain.c:1960
> #7  IA__g_main_context_dispatch (context=0x1d4f120) at gmain.c:2513
> #8  0x00007f8188fbf938 in g_main_context_iterate (context=0x1d4f120, 
> block=1, dispatch=1, self=<value optimized out>) at gmain.c:2591
> #9  0x00007f8188fbfd55 in IA__g_main_loop_run (loop=0x1e734a0) at 
> gmain.c:2799
> #10 0x00000000004052ce in crmd_init () at main.c:154
> #11 0x00000000004055cc in main (argc=1, argv=0x7fffe77a4f88) at 
> main.c:120
> (gdb) up
> #1  0x00007f81896ae085 in abort () at abort.c:92
> 92            raise (SIGABRT);
> (gdb) up
> #2  0x00007f818bb8a56b in crm_abort (file=0x7f818bba9d58 "xml.c", function=0x7f818bbab6b4 "string2xml", line=650, 
>    assert_condition=0x7f818bbaa01a "String parsing error", do_core=<value optimized out>, do_fork=<value optimized out>) at utils.c:1073
> 1073                abort();
> (gdb) up
> #3  0x00007f818bb933af in string2xml (
>    input=0x1e745f8 "<lrmd_notify lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" lrmd_rsc_del"...) at xml.c:650
> 650                 crm_abort(__FILE__, __PRETTY_FUNCTION__, __LINE__, "String parsing error", TRUE, TRUE);
> (gdb) print input
> $1 = 0x1e745f8 "<lrmd_notify lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" lrmd_rsc_del"...
> (gdb) print input+100
> $2 = 0x1e7465c "rmd_rsc_start_delay=\"15000\" lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"3407\" lrmd_rsc_deleted=\"0\" lrmd_run_time=\"0\" lrmd_rcchange_time=\"0\" lrmd_exec_time=\"0\" lrmd_queue_time=\"0\" lrmd_op=\"lr"...
> (gdb) print input+200
> $3 = 0x1e746c0 "eted=\"0\" lrmd_run_time=\"0\" lrmd_rcchange_time=\"0\" lrmd_exec_time=\"0\" lrmd_queue_time=\"0\" lrmd_op=\"lrmd_rsc_exec\" lrmd_rsc_id=\"res_tomcat6_1\" lrmd_rsc_action=\"monitor\" lrmd_rsc_userdata_str=\"4:664:0:59"...
> (gdb) print input+300
> $4 = 0x1e74724 "md_rsc_exec\" lrmd_rsc_id=\"res_tomcat6_1\" lrmd_rsc_action=\"monitor\" lrmd_rsc_userdata_str=\"4:664:0:596925c4-4bfa-46e2-9295-c3f9b6bd1ef9\" lrmd_rsc_output=\"tomcat6 (pid 3199) is running...\033[60G[\033[0;32m  "...
> (gdb) print input+400
> $5 = 0x1e74788 "6925c4-4bfa-46e2-9295-c3f9b6bd1ef9\" lrmd_rsc_output=\"tomcat6 (pid 3199) is running...\033[60G[\033[0;32m  OK  \033[0;39m]\r\n\"><attributes CRM_meta_OCF_CHECK_LEVEL=\"0\" CRM_meta_name=\"monitor\" crm_feature_set=\"3."...
> (gdb) print input+500
> $6 = 0x1e747ec "OK  \033[0;39m]\r\n\"><attributes CRM_meta_OCF_CHECK_LEVEL=\"0\" CRM_meta_name=\"monitor\" crm_feature_set=\"3.0.7\" OCF_CHECK_LEVEL=\"0\" CRM_meta_interval=\"15000\" CRM_meta_timeout=\"30000\" CRM_meta_start_delay=\"15"...
> (gdb) print input+600
> $7 = 0x1e74850 "0.7\" OCF_CHECK_LEVEL=\"0\" CRM_meta_interval=\"15000\" CRM_meta_timeout=\"30000\" CRM_meta_start_delay=\"15000\"/></lrmd_notify>"
> 
> Xavier Lashmar
> X2120
> 
> 
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net]
> Sent: Thursday, May 2, 2013 7:38 PM
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] Pacemaker core dumps
> 
> 
> On 02/05/2013, at 11:37 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
> 
>> Ah, finally got it.
> 
> Can you go to frame 3 (up <ret> up <ret> up <ret>) and run print input print input+100 print input+200 ...etc...
> 
> until you reach the end of the string?
> 
> Then I'll be able to reproduce (and fix) locally.
> 
>> 
>> Core was generated by `/usr/libexec/pacemaker/crmd'.
>> Program terminated with signal 6, Aborted.
>> #0  0x00007f81896ac8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> 64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>> Missing separate debuginfos, use: debuginfo-install
>> libtool-ltdl-2.2.6-15.5.el6.x86_64
>> (gdb) bt
>> #0  0x00007f81896ac8a5 in raise (sig=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> #1  0x00007f81896ae085 in abort () at abort.c:92
>> #2  0x00007f818bb8a56b in crm_abort (file=0x7f818bba9d58 "xml.c", function=0x7f818bbab6b4 "string2xml", line=650, 
>>   assert_condition=0x7f818bbaa01a "String parsing error", 
>> do_core=<value optimized out>, do_fork=<value optimized out>) at
>> utils.c:1073
>> #3  0x00007f818bb933af in string2xml (
>>   input=0x1e745f8 "<lrmd_notify
>> lrmd_origin=\"send_cmd_complete_notify\" lrmd_timeout=\"30000\" 
>> lrmd_rsc_interval=\"15000\" lrmd_rsc_start_delay=\"15000\" 
>> lrmd_exec_rc=\"0\" lrmd_exec_op_status=\"1\" lrmd_callid=\"2747\" 
>> lrmd_rsc_del"...) at xml.c:650
>> #4  0x00007f818b76a2fc in lrmd_ipc_dispatch (buffer=<value optimized
>> out>, length=<value optimized out>, userdata=0x1e72910) at
>> lrmd_client.c:310
>> #5  0x00007f818bba2e90 in mainloop_gio_callback (gio=<value optimized
>> out>, condition=G_IO_IN, data=0x1e73be0) at mainloop.c:585
>> #6  0x00007f8188fbbf0e in g_main_dispatch (context=0x1d4f120) at
>> gmain.c:1960
>> #7  IA__g_main_context_dispatch (context=0x1d4f120) at gmain.c:2513
>> #8  0x00007f8188fbf938 in g_main_context_iterate (context=0x1d4f120, 
>> block=1, dispatch=1, self=<value optimized out>) at gmain.c:2591
>> #9  0x00007f8188fbfd55 in IA__g_main_loop_run (loop=0x1e734a0) at
>> gmain.c:2799
>> #10 0x00000000004052ce in crmd_init () at main.c:154
>> #11 0x00000000004055cc in main (argc=1, argv=0x7fffe77a4f88) at
>> main.c:120
>> 
>> 
>> Xavier Lashmar
>> Analyste de Systèmes | Systems Analyst Service étudiants, service de 
>> l'informatique et des communications/Student services, computing and communications services.
>> 1 Nicholas Street (810)
>> Ottawa ON K1N 7B7
>> Tél. | Tel. 613-562-5800 (2120)
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>> Sent: Wednesday, May 1, 2013 7:07 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] Pacemaker core dumps
>> 
>> 
>> On 01/05/2013, at 11:36 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
>> 
>>> I'm not sure if anyone has run into this issue but I can't seem to 
>>> find a debuginfo package for one of the libraries for CentOS 6.3 
>>> with Kernel 2.6.32-279.9.1el6.x86_64 : libtool-ltdl
>>> 
>>> Here's what I get so far from the core dump, but I think it's incomplete:
>>> 
>>> ...
>>> ...
>>> ...
>>> Reading symbols from /lib64/libfreebl3.so...
>>> warning: the debug information found in "/usr/lib/debug//lib64/libfreebl3.so.debug" does not match "/lib64/libfreebl3.so" (CRC mismatch).
>>> 
>>> warning: the debug information found in "/usr/lib/debug/lib64/libfreebl3.so.debug" does not match "/lib64/libfreebl3.so" (CRC mismatch).
>>> 
>>> Missing separate debuginfo for /lib64/libfreebl3.so
>>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>>> /usr/lib/debug/.build-id/68/195872ecfb188389d29aaf01031a976fd18168.d
>>> e
>>> b
>>> ug
>>> (no debugging symbols found)...done.
>>> Loaded symbols for /lib64/libfreebl3.so Reading symbols from 
>>> /lib64/libnss_files-2.12.so...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.12.so.debug...done.
>>> done.
>>> Loaded symbols for /lib64/libnss_files-2.12.so Core was generated by 
>>> `/usr/libexec/pacemaker/crmd'.
>>> Program terminated with signal 6, Aborted.
>>> #0  0x00007f81896ac8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>>> 64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>>> Missing separate debuginfos, use: debuginfo-install
>>> libtool-ltdl-2.2.6-15.5.el6.x86_64
>>> 
>>> Any info about either finding the right debuginfo files, or about the error itself would be greatly appreciated.
>> 
>> The libtool parts aren't so interesting.
>> Were there no other frames? (lines starting with # and a number)
>> 
>>> 
>>> Xavier Lashmar
>>> Analyste de Systèmes | Systems Analyst Service étudiants, service de 
>>> l'informatique et des communications/Student services, computing and communications services.
>>> 1 Nicholas Street (810)
>>> Ottawa ON K1N 7B7
>>> Tél. | Tel. 613-562-5800 (2120)
>>> 
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>>> Sent: Monday, April 29, 2013 11:00 PM
>>> To: The Pacemaker cluster resource manager
>>> Subject: Re: [Pacemaker] Pacemaker core dumps
>>> 
>>> 
>>> On 30/04/2013, at 1:32 AM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
>>> 
>>>> Hello Andrew,
>>>> 
>>>> Thanks for your help.  We've upgrade to pacemaker 1.1.9 and still have the same issue.  
>>> 
>>> Thats a disappointing but useful data point.
>>> 
>>>> 
>>>> We are trying to get the core information but we are missing some debuginfo files which we are trying to get our hands on.  I'll try to forward this information soon.   
>>> 
>>> Great
>>> 
>>>> 
>>>> Is there something we need to do to the CIB when we upgrade?
>>> 
>>> No, anything that needs to happen will be done under the hood.
>>> 
>>>> 
>>>> 
>>>> Xavier Lashmar
>>>> Analyste de Systèmes | Systems Analyst Service étudiants, service 
>>>> de l'informatique et des communications/Student services, computing and communications services.
>>>> 1 Nicholas Street (810)
>>>> Ottawa ON K1N 7B7
>>>> Tél. | Tel. 613-562-5800 (2120)
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Andrew Beekhof [mailto:andrew at beekhof.net]
>>>> Sent: Thursday, April 25, 2013 8:15 PM
>>>> To: The Pacemaker cluster resource manager
>>>> Subject: Re: [Pacemaker] Pacemaker core dumps
>>>> 
>>>> 
>>>> On 26/04/2013, at 10:06 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>> 
>>>>> 
>>>>> On 25/04/2013, at 11:59 PM, Xavier Lashmar <xlashmar at uottawa.ca> wrote:
>>>>> 
>>>>>> Following further investigation, we were able to determine that upgrading both nodes (in a two node cluster) from Pacemaker 1.1.7-6 to Pacemaker 1.1.8-7 (CentOS 6.3 or Centos 6.4) caused these errors to begin happening:
>>>>> 
>>>>> Would you be able to try the 1.1.9 packages from http://www.clusterlabs.org/rpm-next to see if they are also affected?
>>>>> 
>>>>>> 
>>>>>> We were able to replicate the initiation of the errors by upgrading another cluster in the same manner.  This other cluster is now experiencing the same core-dumping and errors as the previous cluster:
>>>>>> 
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : invalid character in attribute value
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : attributes construct error
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : Couldn't find end of Start Tag lrmd_notify line 1
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: Entity: line 1: parser error : Extra content at the end of the document
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error: a-72fc-47e1-81b4-51b500c967f9" lrmd_rsc_output="tomcat6 (pid 3282) is running...
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_xml_err: XML Error:                                                                                ^
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:  warning: string2xml: Parsing 
>>>>>> failed (domain=1, level=3, code=5): Extra content at the end of 
>>>>>> the document Apr 25 09:46:22 xxxx crmd[1764]:  warning: string2xml: String start:
>>>>>> <lrmd_notify lrmd_origin="send_cmd_complete_notify
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:  warning: string2xml: String start+688: 0000" CRM_meta_start_delay="15000"/></lrmd_notify>
>>>>>> Apr 25 09:46:22 xxxx crmd[1764]:    error: crm_abort: string2xml: Forked child 4182 to record non-fatal assert at xml.c:605 : String parsing error
>>>> 
>>>> Also, it would be very useful if you could open up the core file 
>>>> for
>>>> 4182 and print the contents of the input passed to string2xml() 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org Getting started: 
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org Getting started: 
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list