[ClusterLabs] libqb 0.17.1 - segfault at 1b8

Radoslaw Garbacz radoslaw.garbacz at xtremedatainc.com
Mon May 2 12:47:39 EDT 2016


Hi,

Firstly thank you for such a great tool.

When testing pacemaker I encountered a start error, which seems to be
related to reported libqb segmentation fault.
- cluster started and acquired quorum
- some nodes failed to connect to CIB, and lost membership as a result
- restart solved the problem

Segmentation fault reports libqb library in version 0.17.1, a standard
package provided for CentOS.6.

Please let me know if the problem is known, and if  there is a remedy (e.g.
using the latest libqb).
Logs are below.


Thank you in advance,




Logs from /var/log/messages:

Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Additional logging
available in /var/log/pacemaker.log
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Configured corosync to
accept connections from group 498: Library error (2)
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Starting Pacemaker
1.1.13-1.el6 (Build: 577898d):  generated-manpages agent-manpages ncurses
libqb-logging libqb-ipc upstart nagios  corosync-native atomic-attrd acls
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Quorum acquired
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice:
pcmk_quorum_notification: Node (...)[3] - state is now member (was (null))
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice:
pcmk_quorum_notification: Node (...)[4] - state is now member (was (null))
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice:
pcmk_quorum_notification: Node (...)[2] - state is now member (was (null))
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice:
pcmk_quorum_notification: Node (...)[1] - state is now member (was (null))
Apr 22 15:46:41 (...) lrmd[111194]:   notice: Additional logging available
in /var/log/pacemaker.log
Apr 22 15:46:41 (...) stonith-ng[111193]:   notice: Additional logging
available in /var/log/pacemaker.log
Apr 22 15:46:41 (...) cib[111192]:   notice: Additional logging available
in /var/log/pacemaker.log
Apr 22 15:46:41 (...) attrd[111195]:   notice: Additional logging available
in /var/log/pacemaker.log
Apr 22 15:46:41 (...) stonith-ng[111193]:   notice: Connecting to cluster
infrastructure: corosync
Apr 22 15:46:41 (...) pengine[111196]:   notice: Additional logging
available in /var/log/pacemaker.log
Apr 22 15:46:41 (...) attrd[111195]:   notice: Connecting to cluster
infrastructure: corosync
Apr 22 15:46:41 (...) crmd[111197]:   notice: Additional logging available
in /var/log/pacemaker.log
Apr 22 15:46:41 (...) crmd[111197]:   notice: CRM Git Version: 1.1.13-1.el6
(577898d)
Apr 22 15:46:41 (...) attrd[111195]:    error: Could not connect to the
Cluster Process Group API: 11
Apr 22 15:46:41 (...) attrd[111195]:    error: Cluster connection failed
Apr 22 15:46:41 (...) attrd[111195]:   notice: Cleaning up before exit
Apr 22 15:46:41 (...) stonith-ng[111193]:   notice: crm_update_peer_proc:
Node (...)[3] - state is now member (was (null))
Apr 22 15:46:41 (...) pacemakerd[111190]:    error: Managed process 111195
(attrd) dumped core
Apr 22 15:46:41 (...) pacemakerd[111190]:    error: The attrd process
(111195) terminated with signal 11 (core=1)
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Respawning failed child
process: attrd
Apr 22 15:46:41 (...) cib[111192]:   notice: Connecting to cluster
infrastructure: corosync
Apr 22 15:46:41 (...) cib[111192]:    error: Could not connect to the
Cluster Process Group API: 11
Apr 22 15:46:41 (...) cib[111192]:     crit: Cannot sign in to the
cluster... terminating
Apr 22 15:46:41 (...) kernel: [17169.112132] attrd[111195]: segfault at 1b8
ip 00007f6fc9dc3181 sp 00007ffd7cf668f0 error 4 in
libqb.so.0.17.1[7f6fc9db4000+21000]
Apr 22 15:46:41 (...) pacemakerd[111190]:  warning: The cib process
(111192) can no longer be respawned, shutting the cluster down.
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Shutting down Pacemaker
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Stopping crmd: Sent -15
to process 111197
Apr 22 15:46:41 (...) attrd[111198]:   notice: Additional logging available
in /var/log/pacemaker.log
Apr 22 15:46:41 (...) crmd[111197]:  warning: Couldn't complete CIB
registration 1 times... pause and retry
Apr 22 15:46:41 (...) crmd[111197]:   notice: Invoking handler for signal
15: Terminated
Apr 22 15:46:41 (...) crmd[111197]:   notice: Requesting shutdown, upper
limit is 1200000ms
Apr 22 15:46:41 (...) crmd[111197]:  warning: FSA: Input I_SHUTDOWN from
crm_shutdown() received in state S_STARTING
Apr 22 15:46:41 (...) crmd[111197]:   notice: State transition S_STARTING
-> S_STOPPING [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Apr 22 15:46:41 (...) crmd[111197]:   notice: Disconnecting from Corosync
Apr 22 15:46:41 (...) attrd[111198]:   notice: Connecting to cluster
infrastructure: corosync
Apr 22 15:46:41 (...) attrd[111198]:    error: Could not connect to the
Cluster Process Group API: 11
Apr 22 15:46:41 (...) attrd[111198]:    error: Cluster connection failed
Apr 22 15:46:41 (...) attrd[111198]:   notice: Cleaning up before exit
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Stopping pengine: Sent
-15 to process 111196
Apr 22 15:46:41 (...) pengine[111196]:   notice: Invoking handler for
signal 15: Terminated
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Stopping attrd: Sent
-15 to process 111198
Apr 22 15:46:41 (...) pacemakerd[111190]:    error: Managed process 111198
(attrd) dumped core
Apr 22 15:46:41 (...) pacemakerd[111190]:    error: The attrd process
(111198) terminated with signal 11 (core=1)
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Stopping lrmd: Sent -15
to process 111194
Apr 22 15:46:41 (...) lrmd[111194]:   notice: Invoking handler for signal
15: Terminated
Apr 22 15:46:41 (...) pacemakerd[111190]:   notice: Stopping stonith-ng:
Sent -15 to process 111193
Apr 22 15:46:41 (...) kernel: [17169.121628] attrd[111198]: segfault at 1b8
ip 00007f3a98f66181 sp 00007ffe33407380 error 4 in
libqb.so.0.17.1[7f3a98f57000+21000]
Apr 22 15:46:50 (...) stonith-ng[111193]:    error: Could not connect to
the CIB service: Transport endpoint is not connected (-107)
Apr 22 15:46:50 (...) stonith-ng[111193]:   notice: Invoking handler for
signal 15: Terminated
Apr 22 15:46:50 (...) pacemakerd[111190]:   notice: Shutdown complete
Apr 22 15:46:50 (...) pacemakerd[111190]:   notice: Attempting to inhibit
respawning after fatal error




Logs from corosync log:

Apr 22 15:46:22 [93582] (...) corosync notice  [MAIN  ] Corosync Cluster
Engine exiting normally
Apr 22 15:46:40 [111147] (...) corosync notice  [MAIN  ] Corosync Cluster
Engine ('2.3.5.12-a71e'): started and ready to provide service.
Apr 22 15:46:40 [111147] (...) corosync info    [MAIN  ] Corosync built-in
features: dbus pie relro bindnow
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] Initializing
transport (UDP/IP Unicast).
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] The network
interface [(...)] is now up.
Apr 22 15:46:40 [111147] (...) corosync notice  [SERV  ] Service engine
loaded: corosync configuration map access [0]
Apr 22 15:46:40 [111147] (...) corosync info    [QB    ] server name: cmap
Apr 22 15:46:40 [111147] (...) corosync notice  [SERV  ] Service engine
loaded: corosync configuration service [1]
Apr 22 15:46:40 [111147] (...) corosync info    [QB    ] server name: cfg
Apr 22 15:46:40 [111147] (...) corosync notice  [SERV  ] Service engine
loaded: corosync cluster closed process group service v1.01 [2]
Apr 22 15:46:40 [111147] (...) corosync info    [QB    ] server name: cpg
Apr 22 15:46:40 [111147] (...) corosync notice  [SERV  ] Service engine
loaded: corosync profile loading service [4]
Apr 22 15:46:40 [111147] (...) corosync notice  [QUORUM] Using quorum
provider corosync_votequorum
Apr 22 15:46:40 [111147] (...) corosync notice  [SERV  ] Service engine
loaded: corosync vote quorum service v1.0 [5]
Apr 22 15:46:40 [111147] (...) corosync info    [QB    ] server name:
votequorum
Apr 22 15:46:40 [111147] (...) corosync notice  [SERV  ] Service engine
loaded: corosync cluster quorum service v0.1 [3]
Apr 22 15:46:40 [111147] (...) corosync info    [QB    ] server name: quorum
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] adding new UDPU
member {(...)}
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] adding new UDPU
member {(...)}
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] adding new UDPU
member {(...)}
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] adding new UDPU
member {(...)}
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] A new membership
((...):660) was formed. Members joined: 3
Apr 22 15:46:40 [111147] (...) corosync notice  [QUORUM] Members[1]: 3
Apr 22 15:46:40 [111147] (...) corosync notice  [MAIN  ] Completed service
synchronization, ready to provide service.
Apr 22 15:46:40 [111147] (...) corosync notice  [TOTEM ] A new membership
((...):664) was formed. Members joined: 4 2 1
Apr 22 15:46:40 [111147] (...) corosync notice  [QUORUM] This node is
within the primary component and will provide service.
Apr 22 15:46:40 [111147] (...) corosync notice  [QUORUM] Members[4]: 3 4 2 1
Apr 22 15:46:40 [111147] (...) corosync notice  [MAIN  ] Completed service
synchronization, ready to provide service.
Apr 22 15:46:41 [111147] (...) corosync error   [MAIN  ] Denied connection
attempt from 498:498
Apr 22 15:46:41 [111147] (...) corosync error   [QB    ] Invalid IPC
credentials (111148-111195-2).
Apr 22 15:46:41 [111147] (...) corosync error   [MAIN  ] Denied connection
attempt from 498:498
Apr 22 15:46:41 [111147] (...) corosync error   [QB    ] Invalid IPC
credentials (111148-111192-2).
Apr 22 15:46:41 [111147] (...) corosync error   [MAIN  ] Denied connection
attempt from 498:498
Apr 22 15:46:41 [111147] (...) corosync error   [QB    ] Invalid IPC
credentials (111148-111198-2).



-- 
Best Regards,

Radoslaw Garbacz
XtremeData Incorporation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160502/9df92cd0/attachment-0002.html>


More information about the Users mailing list