[Pacemaker] pacemaker shutdown under high load

Andrew Beekhof andrew at beekhof.net
Wed Oct 30 18:26:54 EDT 2013


On 17 Oct 2013, at 1:37 am, Alessandro Bono <alessandro.bono at gmail.com> wrote:

> On 16/10/2013 00:11, Andrew Beekhof wrote:
>> On 09/10/2013, at 10:53 PM, Alessandro Bono <alessandro.bono at gmail.com>
>>  wrote:
>> 
>> 
>>> Hi
>>> 
>>> 
>>> this week end my pacemaker shutdown on primary node during machine backup
>>> attached compressed log of primary node, logs of secondary node is too big, if needed I can provide as external link
>>> inspecting logs I found these errors
>>> 
>> looks like corosync went away from underneath pacemaker, hence "Corosync connection lost!  Exiting."
> Is there a way to debug this problem?

Enable more logging in corosync?  Look for a core file too.
The corosync list might have more practical advice.

> Nodes are regular centos 6.4 64bit machine with this corosync version
> 
> corosync-1.4.1-15.el6_4.1.x86_64
> corosynclib-1.4.1-15.el6_4.1.x86_64
> 
> Have I to package latest 1.4.x version and try it?

Wouldn't hurt.

> As a workaround I put in maintaince mode cluster prior to backup but it's not a solution
> 
>>> Oct 05 22:26:46 [31338] ga1-ext        cib:     info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=ga2-ext/crmd/17, version=0.155.87)
>>> Oct 05 22:26:46 [31341] ga1-ext      attrd:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31343] ga1-ext       crmd:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31343] ga1-ext       crmd:    error: crmd_cs_destroy:  connection terminated
>>> Oct 05 22:26:46 [31343] ga1-ext       crmd:    debug: qb_ipcs_unref:    qb_ipcs_unref() - destroying
>>> Oct 05 22:26:47 [31343] ga1-ext       crmd:     info: qb_ipcs_us_withdraw:      withdrawing server sockets
>>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_ipcc_disconnect:       qb_ipcc_disconnect()
>>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_rb_close:      Closing ringbuffer: /dev/shm/qb-attrd-request-31341-31343-9-header
>>> Oct 05 22:26:46 [31332] ga1-ext pacemakerd:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31339] ga1-ext stonith-ng:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:46 [31338] ga1-ext        cib:    error: pcmk_cpg_dispatch:        Connection to the CPG API failed: Library error (2)
>>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_rb_close:      Closing ringbuffer: /dev/shm/qb-attrd-response-31341-31343-9-header
>>> Oct 05 22:26:47 [31332] ga1-ext pacemakerd:    error: mcp_cpg_destroy:  Connection destroyed
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:    error: stonith_peer_cs_destroy:  Corosync connection terminated
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:     info: stonith_shutdown:         Terminating with  1 clients
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:    debug: cib_native_signoff:       Signing out of the CIB Service
>>> Oct 05 22:26:47 [31339] ga1-ext stonith-ng:    debug: qb_ipcc_disconnect:       qb_ipcc_disconnect()
>>> Oct 05 22:26:47 [31343] ga1-ext       crmd:    debug: qb_rb_close:      Closing ringbuffer: /dev/shm/qb-attrd-event-31341-31343-9-header
>>> Oct 05 22:26:47 [31341] ga1-ext      attrd:     crit: attrd_cs_destroy:         Lost connection to Corosync service!
>>> Oct 05 22:26:47 [31341] ga1-ext      attrd:   notice: main:     Exiting...
>>> Oct 05 22:26:47 [31341] ga1-ext      attrd:   notice: main:     Disconnecting client 0x1b03990, pid=31343...
>>> Oct 05 22:26:47 [31341] ga1-ext      attrd:    debug: qb_ipcs_disconnect:       qb_ipcs_disconnect(31341-31343-9) state:2
>>> Oct 05 22:26:47 [31341] ga1-ext      attrd:     info: crm_client_destroy:       Destroying 0 events
>>> Oct 05 22:26:47 [31338] ga1-ext        cib:    error: cib_cs_destroy:   Corosync connection lost!  Exiting.
>>> 
>>> ps this is a resend to open a new thread, sorry for double mail
>>> 
>>> -- 
>>> Cordiali Saluti
>>> Alessandro Bono
>>> 
>>> <ga1-ext.corosync.log-20131006.gz>_______________________________________________
>>> Pacemaker mailing list: 
>>> Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> 
>>> Project Home: 
>>> http://www.clusterlabs.org
>>> 
>>> Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> 
>>> Bugs: 
>>> http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: 
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> 
>> Project Home: 
>> http://www.clusterlabs.org
>> 
>> Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> 
>> Bugs: 
>> http://bugs.clusterlabs.org
> 
> 
> -- 
> Cordiali Saluti
> Alessandro Bono
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list