[Pacemaker] pacemaker shutdown under high load

Alessandro Bono alessandro.bono at gmail.com
Wed Oct 16 10:37:01 EDT 2013


On 16/10/2013 00:11, Andrew Beekhof wrote:
> On 09/10/2013, at 10:53 PM, Alessandro Bono <alessandro.bono at gmail.com> wrote:
>
>> Hi
>>
>>
>> this week end my pacemaker shutdown on primary node during machine backup
>> attached compressed log of primary node, logs of secondary node is too big, if needed I can provide as external link
>> inspecting logs I found these errors
> looks like corosync went away from underneath pacemaker, hence "Corosync connection lost!  Exiting."
Is there a way to debug this problem? Nodes are regular centos 6.4 64bit 
machine with this corosync version

corosync-1.4.1-15.el6_4.1.x86_64
corosynclib-1.4.1-15.el6_4.1.x86_64

Have I to package latest 1.4.x version and try it?
As a workaround I put in maintaince mode cluster prior to backup but 
it's not a solution

>
>> Oct 05 22:26:46 [31338] ga1-ext        cib:     info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=ga2-ext/crmd/17, version=0.155.87)
>> Oct 05 22:26:46 [31341] ga1-ext      attrd:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>> Oct 05 22:26:46 [31343] ga1-ext       crmd:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>> Oct 05 22:26:46 [31343] ga1-ext       crmd:    error: crmd_cs_destroy:  connection terminated
>> Oct 05 22:26:46 [31343] ga1-ext       crmd:    debug: qb_ipcs_unref:    qb_ipcs_unref() - destroying
>> Oct 05 22:26:47 [31343] ga1-ext       crmd:     info: qb_ipcs_us_withdraw:      withdrawing server sockets
>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_ipcc_disconnect:       qb_ipcc_disconnect()
>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_rb_close:      Closing ringbuffer: /dev/shm/qb-attrd-request-31341-31343-9-header
>> Oct 05 22:26:46 [31332] ga1-ext pacemakerd:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>> Oct 05 22:26:46 [31339] ga1-ext stonith-ng:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>> Oct 05 22:26:46 [31338] ga1-ext        cib:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_rb_close:      Closing ringbuffer: /dev/shm/qb-attrd-response-31341-31343-9-header
>> Oct 05 22:26:47 [31332] ga1-ext pacemakerd:    error: mcp_cpg_destroy:  Connection destroyed
>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:    error: stonith_peer_cs_destroy:  Corosync connection terminated
>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:     info: stonith_shutdown:         Terminating with  1 clients
>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:    debug: cib_native_signoff:       Signing out of the CIB Service
>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:    debug: qb_ipcc_disconnect:       qb_ipcc_disconnect()
>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_rb_close:      Closing ringbuffer: /dev/shm/qb-attrd-event-31341-31343-9-header
>> Oct 05 22:26:47 [31341] ga1-ext      attrd:     crit: attrd_cs_destroy:         Lost connection to Corosync service!
>> Oct 05 22:26:47 [31341] ga1-ext      attrd:   notice: main:     Exiting...
>> Oct 05 22:26:47 [31341] ga1-ext      attrd:   notice: main:     Disconnecting client 0x1b03990, pid=31343...
>> Oct 05 22:26:47 [31341] ga1-ext      attrd:    debug: qb_ipcs_disconnect:       qb_ipcs_disconnect(31341-31343-9) state:2
>> Oct 05 22:26:47 [31341] ga1-ext      attrd:     info: crm_client_destroy:       Destroying 0 events
>> Oct 05 22:26:47 [31338] ga1-ext        cib:    error: cib_cs_destroy:   Corosync connection lost!  Exiting.
>>
>> ps this is a resend to open a new thread, sorry for double mail
>>
>> -- 
>> Cordiali Saluti
>> Alessandro Bono
>>
>> <ga1-ext.corosync.log-20131006.gz>_______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Cordiali Saluti
Alessandro Bono

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131016/fe5b69b7/attachment-0003.html>


More information about the Pacemaker mailing list