<div dir="ltr"><pre class="" id="comment_text_28" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)">I faced this issue one more time.
Now I can surly say that Corosync doesn't crash.
On a working machine I stopped Pacemaker and Corosync.
Then I started them with the next commands and got this:
------------------
# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
# /etc/init.d/corosync status
corosync (pid 100837) is running...
# /etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager[ OK ]
# /etc/init.d/pacemaker status
pacemakerd is stopped
------------------
/var/log/messages:
------------------
Apr 22 10:49:08 daemon.notice<29> pacemaker: Starting Pacemaker Cluster Manager
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
Apr 22 10:49:08 daemon.err<27> pacemakerd[114133]: error: mcp_read_config: Couldn't create logfile: /var/log/pacemaker.log
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: mcp_read_config: Configured corosync to accept connections from group 107: Library error (2)
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: main: Starting Pacemaker 1.1.12 (Build: 561c4cf): generated-manpages agent-manpages ascii-docs ncurses libqb-logging libqb-ipc lha-fencing upstart nagios corosync-native snmp libesmtp acls
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: cluster_connect_quorum: Quorum lost
Apr 22 10:49:08 daemon.notice<29> stonithd[114136]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 22 10:49:08 daemon.notice<29> attrd[114138]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 22 10:49:08 daemon.err<27> corosync[100837]: [MAIN ] Denied connection attempt from 105:107
Apr 22 10:49:08 daemon.err<27> attrd[114138]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 22 10:49:08 daemon.err<27> attrd[114138]: error: main: Cluster connection failed
Apr 22 10:49:08 daemon.notice<29> attrd[114138]: notice: main: Cleaning up before exit
Apr 22 10:49:08 <a href="http://kern.info">kern.info</a><6> kernel: [162259.416242] attrd[114138]: segfault at 1b8 ip 00007f375481c9e1 sp 00007fff7ddf0d50 error 4 in libqb.so.0.17.1[7f375480d000+22000]
Apr 22 10:49:08 daemon.err<27> corosync[100837]: [QB ] Invalid IPC credentials (100837-114138-2).
Apr 22 10:49:08 daemon.notice<29> cib[114135]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 22 10:49:08 daemon.err<27> cib[114135]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 22 10:49:08 daemon.crit<26> cib[114135]: crit: cib_init: Cannot sign in to the cluster... terminating
Apr 22 10:49:08 daemon.err<27> corosync[100837]: [MAIN ] Denied connection attempt from 105:107
Apr 22 10:49:08 daemon.err<27> corosync[100837]: [QB ] Invalid IPC credentials (100837-114135-3).
Apr 22 10:49:08 daemon.notice<29> crmd[114140]: notice: main: CRM Git Version: 561c4cf
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: crm_update_peer_state: pcmk_quorum_notification: Node node-0[1] - state is now member (was (null))
Apr 22 10:49:08 daemon.err<27> pacemakerd[114133]: error: pcmk_child_exit: Child process cib (114135) exited: Network is down (100)
Apr 22 10:49:08 daemon.warning<28> pacemakerd[114133]: warning: pcmk_child_exit: Pacemaker child process cib no longer wishes to be respawned. Shutting ourselves down.
Apr 22 10:49:08 daemon.err<27> pacemakerd[114133]: error: child_waitpid: Managed process 114138 (attrd) dumped core
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: pcmk_child_exit: Child process attrd terminated with signal 11 (pid=114138, core=1)
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: stop_child: Stopping crmd: Sent -15 to process 114140
Apr 22 10:49:08 daemon.warning<28> crmd[114140]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
Apr 22 10:49:08 daemon.notice<29> crmd[114140]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
Apr 22 10:49:08 daemon.warning<28> crmd[114140]: warning: do_log: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
Apr 22 10:49:08 daemon.notice<29> crmd[114140]: notice: do_state_transition: State transition S_STARTING -> S_STOPPING [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Apr 22 10:49:08 daemon.notice<29> crmd[114140]: notice: terminate_cs_connection: Disconnecting from Corosync
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: stop_child: Stopping pengine: Sent -15 to process 114139
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: stop_child: Stopping lrmd: Sent -15 to process 114137
Apr 22 10:49:08 daemon.notice<29> pacemakerd[114133]: notice: stop_child: Stopping stonith-ng: Sent -15 to process 114136
Apr 22 10:49:17 daemon.err<27> stonithd[114136]: error: setup_cib: Could not connect to the CIB service: Transport endpoint is not connected (-107)
Apr 22 10:49:17 daemon.notice<29> pacemakerd[114133]: notice: pcmk_shutdown_worker: Shutdown complete
Apr 22 10:49:17 daemon.notice<29> pacemakerd[114133]: notice: pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error
------------------
"/var/cores/" contains only "core.attrd-*".
What else can I do?</pre><pre class="" id="comment_text_28" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)"><br></pre><pre class="" id="comment_text_28" style="white-space:pre-wrap;word-wrap:break-word;width:50em;color:rgb(0,0,0)"><pre class="" id="comment_text_29" style="white-space:pre-wrap;word-wrap:break-word;width:50em">Could be the problem in 'libqb'?
I noticed this line in the log:
Apr 22 10:49:08 <a href="http://kern.info">kern.info</a><6> kernel: [162259.416242] attrd[114138]: segfault at 1b8 ip 00007f375481c9e1 sp 00007fff7ddf0d50 error 4 in libqb.so.0.17.1[7f375480d000+22000]</pre><pre class="" id="comment_text_29" style="white-space:pre-wrap;word-wrap:break-word;width:50em"><br></pre></pre></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr">Thank you,<div>Kostya</div></div></div></div>
<br><div class="gmail_quote">On Mon, Apr 20, 2015 at 7:56 AM, Andrew Beekhof <span dir="ltr"><<a href="mailto:andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
> On 14 Apr 2015, at 9:01 pm, Kostiantyn Ponomarenko <<a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>> wrote:<br>
><br>
> Disk wasn't full.<br>
> According to: "Mar 27 14:00:50 daemon.err<27> pacemakerd[111069]: error: child_waitpid: Managed process 111074 (attrd) dumped core", there is a core dump in "/var/cores/core.attrd-111074-1427464849".<br>
> It is the one which corresponds to the log snippet and it is attached to the email.<br>
<br>
</span>attrd crashing will be unrelated to whether or not corosync is also crashing<br>
<div><div class="h5"><br>
><br>
><br>
><br>
><br>
> Thank you,<br>
> Kostya<br>
><br>
> On Fri, Apr 10, 2015 at 10:00 AM, Jan Pokorný <<a href="mailto:jpokorny@redhat.com">jpokorny@redhat.com</a>> wrote:<br>
> Hello ,<br>
><br>
> On 30/03/15 10:36 +1100, Andrew Beekhof wrote:<br>
> >> On 28 Mar 2015, at 1:10 am, Kostiantyn Ponomarenko <<a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>> wrote:<br>
> >> If I start/stop Corosync and Pacemaker few times I get the state<br>
> >> where Corosync is running, but Pacemaker cannot start.<br>
> >> Here is a snippet from /var/log/messages:<br>
><br>
> [...]<br>
><br>
> >> Mar 27 14:00:49 daemon.notice<29> pacemakerd[111069]: notice: mcp_read_config: Configured corosync to accept connections from group 107: Library error (2)<br>
> ><br>
> > Everything else flows from this.<br>
> > Perhaps one of the corosync people can comment on the conditions<br>
> > under which this call would fail.<br>
><br>
> CC'd relevant ML.<br>
><br>
> > Relevant code from pacemaker is:<br>
> ><br>
> > char key[PATH_MAX];<br>
> > snprintf(key, PATH_MAX, "uidgid.gid.%u", gid);<br>
> > rc = cmap_set_uint8(local_handle, key, 1);<br>
> > crm_notice("Configured corosync to accept connections from group %u: %s (%d)",<br>
> > gid, ais_error2text(rc), rc);<br>
><br>
><br>
> Appears to resemble <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1114852" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1114852</a><br>
><br>
> --<br>
> Jan<br>
><br>
> _______________________________________________<br>
> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
><br>
><br>
</div></div>> <core.attrd-111074-1427464849>_______________________________________________<br>
<div class="HOEnZb"><div class="h5">> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br>
<br>
_______________________________________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br></div>