[Pacemaker] cibadmin -Q: Call cib_query failed (-62): Timer expired

Andrew Beekhof andrew at beekhof.net
Wed Oct 2 01:13:58 EDT 2013


On 28/09/2013, at 5:37 AM, Radoslaw Garbacz <radoslaw.garbacz at xtremedatainc.com> wrote:

> The problem was actually of a different nature - nothing to do with
> cib_shm. The logs showed later on that the connection to cib was
> established, just the corosync configuration file didn't hava a proper
> quorum section, which caused the experienced problems.
> 
> After fixing "corosync,conf" "quorum" section everything works.

I would not have expected that one would result in the other.
Glad you got it sorted out though!

> 
> many thanks,
> 
> 
> On Fri, Sep 27, 2013 at 2:16 PM, Radoslaw Garbacz
> <radoslaw.garbacz at xtremedatainc.com> wrote:
>> cibadmin -Ql works, problem is persistent after upgrade, and the logs
>> for "crmd" reviled the problem:
>> 
>> Sep 27 16:19:22 [5074] ip-10-82-197-219       crmd:     info:
>> crm_ipc_connect:  Could not establish cib_shm connection: Connection
>> refused (111)
>> Sep 27 16:19:22 [5074] ip-10-82-197-219       crmd:    debug:
>> cib_native_signon_raw:    Connection unsuccessful (0 (nil))
>> Sep 27 16:19:22 [5074] ip-10-82-197-219       crmd:    debug:
>> cib_native_signon_raw:    Connection to CIB failed: Transport endpoint
>> is not connected
>> 
>> I will keep searching for the solution, but in meantime, if you had a
>> moment, any hint would be welcomed.
>> 
>> many thanks,
>> 
>> 
>> On Thu, Sep 26, 2013 at 9:25 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>> 
>>> On 27/09/2013, at 8:45 AM, Radoslaw Garbacz <radoslaw.garbacz at xtremedatainc.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have a problem starting up a cluster after upgrading corosync from
>>>> 1.4 to 2.3.2 and pacemaker from 1.8 to 1.9.
>>>> 
>>>> All "crm_node" calls report well, but any CIB manipulation fails, i.e.:
>>>> * crm_node -q: 1
>>>> * crm_node -l: OK
>>>> * crm_node -p: OK
>>>> * cibadmin -Q: Call cib_query failed (-62): Timer expired
>>> 
>>> Does cibadmin -Ql work?
>>> If so, there might be a DC election going on (look in the logs for "crmd").
>>> Is the error transient or persistent?
>>> 
>>>> 
>>>> No iptables, no SELinux, 3 nodes cluster, corosync.conf:
>>>> ...
>>>>       ringnumber: 0
>>>>       bindnetaddr: ...
>>>>       mcastport: 7800
>>>>   }
>>>> 
>>>>   transport: udpu
>>>> 
>>>> 
>>>> 
>>>> Any help greatly appreciated.
>>>> 
>>>> 
>>>> Below is some more information:
>>>> 
>>>> * pacemaker logs:
>>>> 
>>>> Sep 26 22:24:00 [2836] ip-10-114-210-162        cib:     info:
>>>> crm_client_new:  Connecting 0x111b780 for uid=0 gid=0 pid=2883
>>>> id=977d6f23-963b-41a4-8fe0-a63024080d41
>>>> Sep 26 22:24:00 [2836] ip-10-114-210-162        cib:     info:
>>>> cib_process_request:     Forwarding cib_query operation for section
>>>> 'all' to master (origin=local/cibadmin/2)
>>>> Sep 26 22:24:30 [2836] ip-10-114-210-162        cib:     info:
>>>> crm_client_destroy:      Destroying 0 events
>>>> 
>>>> 
>>>> * ps axf | grep pacemaker|corosync:
>>>> 
>>>> 2806 ?        Ssl    0:10 corosync
>>>> 2834 pts/1    S      0:00 pacemakerd
>>>> 2836 ?        Ss     0:01  \_ /usr/libexec/pacemaker/cib
>>>> 2837 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
>>>> 2838 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
>>>> 2839 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
>>>> 2840 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
>>>> 2841 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd
>>>> 
>>>> 
>>>> * strace cibadmin -Q:
>>>> 
>>>> open("/dev/shm/qb-cib_rw-event-2836-2897-12-data", O_RDWR) = 6
>>>> ftruncate(6, 20480000)                  = 0
>>>> mmap(NULL, 40960000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
>>>> 0x7fa221692000
>>>> mmap(0x7fa221692000, 20480000, PROT_READ|PROT_WRITE,
>>>> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa221692000
>>>> mmap(0x7fa222a1a000, 20480000, PROT_READ|PROT_WRITE,
>>>> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa222a1a000
>>>> close(6)                                = 0
>>>> close(5)                                = 0
>>>> close(6)                                = -1 EBADF (Bad file descriptor)
>>>> fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
>>>> fcntl(4, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
>>>> poll([{fd=4, events=POLLIN}], 1, 0)     = 0 (Timeout)
>>>> poll([{fd=4, events=POLLIN}], 1, 0)     = 0 (Timeout)
>>>> sendto(4, "~", 1, MSG_NOSIGNAL, NULL, 0) = 1
>>>> futex(0x7fa22df4cb60, FUTEX_WAKE_PRIVATE, 2147483647) = 0
>>>> gettimeofday({1380234692, 68879}, NULL) = 0
>>>> poll([{fd=4, events=POLLIN}], 1, 0)     = 0 (Timeout)
>>>> poll([{fd=4, events=POLLIN}], 1, 0)     = 0 (Timeout)
>>>> gettimeofday({1380234692, 69522}, NULL) = 0
>>>> sendto(4, "\274", 1, MSG_NOSIGNAL, NULL, 0) = 1
>>>> poll([{fd=4, events=POLLIN}], 1, 0)     = 0 (Timeout)
>>>> gettimeofday({1380234692, 70085}, NULL) = 0
>>>> gettimeofday({1380234692, 70197}, NULL) = 0
>>>> poll([{fd=4, events=POLLIN}], 1, 30000) = 0 (Timeout)
>>>> gettimeofday({1380234722, 91625}, NULL) = 0
>>>> write(2, "Call cib_query failed (-62): Tim"..., 43Call cib_query
>>>> failed (-62): Timer expired
>>>> ) = 43
>>>> poll([{fd=4, events=POLLIN}], 1, 0)     = 0 (Timeout)
>>>> 
>>>> 
>>>> * netstat -lxp:
>>>> 
>>>> Active UNIX domain sockets (only servers)
>>>> Proto RefCnt Flags       Type       State         I-Node PID/Program
>>>> name    Path
>>>> unix  2      [ ACC ]     STREAM     LISTENING     20021  2836/cib
>>>>     @cib_rw
>>>> unix  2      [ ACC ]     STREAM     LISTENING     19958  2838/lrmd
>>>>     @lrmd
>>>> unix  2      [ ACC ]     STREAM     LISTENING     19789  2806/corosync
>>>>     @quorum
>>>> unix  2      [ ACC ]     STREAM     LISTENING     19786  2806/corosync
>>>>     @cmap
>>>> unix  2      [ ACC ]     STREAM     LISTENING     20020  2836/cib
>>>>     @cib_ro
>>>> unix  2      [ ACC ]     STREAM     LISTENING     20057  2837/stonithd
>>>>     @stonith-ng
>>>> unix  2      [ ACC ]     STREAM     LISTENING     19787  2806/corosync
>>>>     @cfg
>>>> unix  2      [ ACC ]     STREAM     LISTENING     19906
>>>> 2834/pacemakerd     @pacemakerd
>>>> unix  2      [ ACC ]     STREAM     LISTENING     19788  2806/corosync
>>>>     @cpg
>>>> unix  2      [ ACC ]     STREAM     LISTENING     20022  2836/cib
>>>>     @cib_shm
>>>> unix  2      [ ACC ]     STREAM     LISTENING     19985  2840/pengine
>>>>     @pengine
>>>> 
>>>> 
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> --
>>>> Best Regards,
>>>> 
>>>> Radoslaw Garbacz
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> 
>> 
>> --
>> Best Regards,
>> 
>> Radoslaw Garbacz
>> XtremeData Incorporation
> 
> 
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporation
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131002/6bb8bc61/attachment-0002.sig>


More information about the Pacemaker mailing list