[Pacemaker] Master/Slave resource cannot start

Andrew Beekhof andrew at beekhof.net
Mon Aug 24 08:37:39 UTC 2009


The stack trace makes it look like a logging deadlock.
I'll ask the openais maintainer about it.

On Fri, Aug 21, 2009 at 5:11 PM, Diego
Remolina<diego.remolina at physics.gatech.edu> wrote:
> Here is what I am seeing now right after stopping openais, updating
> heartbeat and pacemaker and trying to start openais again:
>
> [root at phys-file02 ~]# /etc/init.d/openais status
> Stopped
> [root at phys-file02 ~]# /etc/init.d/openais start
> Starting OpenAIS daemon (aisexec): starting... rc=0: OK
> [root at phys-file02 ~]# crm status
>
> Connection to cluster failed: connection failed
> [root at phys-file02 ~]# crm status
>
> Connection to cluster failed: connection failed
> [root at phys-file02 ~]# crm status
>
> Connection to cluster failed: connection failed
> [root at phys-file02 ~]# yum -y install gdb
>
> At this point, I installed gdb and here is what I get:
>
> [root at phys-file02 ~]# ps -ef | grep aisexec
> root     19423     1  0 11:01 pts/1    00:00:00 aisexec
> root     19520 19241  0 11:02 pts/1    00:00:00 grep aisexec
> [root at phys-file02 ~]# gdb aisexec 19423
> GNU gdb Fedora (6.8-27.el5)
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
> Attaching to program: /usr/sbin/aisexec, process 19423
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 0x2ae946b8fec0 (LWP 19423)]
> [New Thread 0x40638fe0 (LWP 19425)]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/libexec/lcrso/objdb.lcrso...done.
> Loaded symbols for /usr/libexec/lcrso/objdb.lcrso
> Reading symbols from /usr/libexec/lcrso/aisparser.lcrso...done.
> Loaded symbols for /usr/libexec/lcrso/aisparser.lcrso
> Reading symbols from /usr/libexec/lcrso/pacemaker.lcrso...done.
> Loaded symbols for /usr/libexec/lcrso/pacemaker.lcrso
> Reading symbols from /usr/lib64/libplumb.so.2...done.
> Loaded symbols for /usr/lib64/libplumb.so.2
> Reading symbols from /usr/lib64/libpils.so.2...done.
> Loaded symbols for /usr/lib64/libpils.so.2
> Reading symbols from /usr/lib64/libbz2.so.1...done.
> Loaded symbols for /usr/lib64/libbz2.so.1
> Reading symbols from /usr/lib64/libxslt.so.1...done.
> Loaded symbols for /usr/lib64/libxslt.so.1
> Reading symbols from /usr/lib64/libxml2.so.2...done.
> Loaded symbols for /usr/lib64/libxml2.so.2
> Reading symbols from /lib64/libuuid.so.1...done.
> Loaded symbols for /lib64/libuuid.so.1
> Reading symbols from /lib64/libpam.so.0...done.
> Loaded symbols for /lib64/libpam.so.0
> Reading symbols from /lib64/librt.so.1...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libglib-2.0.so.0...done.
> Loaded symbols for /lib64/libglib-2.0.so.0
> Reading symbols from /usr/lib64/libltdl.so.3...done.
> Loaded symbols for /usr/lib64/libltdl.so.3
> Reading symbols from /usr/lib64/libz.so.1...done.
> Loaded symbols for /usr/lib64/libz.so.1
> Reading symbols from /lib64/libm.so.6...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libaudit.so.0...done.
> Loaded symbols for /lib64/libaudit.so.0
> Reading symbols from /lib64/libgcc_s.so.1...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> 0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6
> (gdb) where
> #0  0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6
> #1  0x0000003be088c74d in _L_lock_1685 () from /lib64/libc.so.6
> #2  0x0000003be088c497 in __tz_convert () from /lib64/libc.so.6
> #3  0x0000000000418a16 in _log_printf ()
> #4  0x0000000000418cb1 in internal_log_printf2 ()
> #5  0x00002aaaab0b8819 in pcmk_plugin_init () from
> /usr/libexec/lcrso/pacemaker.lcrso
> #6  0x00002aaaab0b946a in pcmk_startup () from
> /usr/libexec/lcrso/pacemaker.lcrso
> #7  0x000000000041a422 in openais_service_link_and_init ()
> #8  0x000000000041a5c8 in openais_service_defaults_link_and_init ()
> #9  0x0000000000418117 in main ()
> (gdb) thread 0
> Thread ID 0 not known.
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x2ae946b8fec0 (LWP 19423))]#0
> 0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6
> (gdb) where
> #0  0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6
> #1  0x0000003be088c74d in _L_lock_1685 () from /lib64/libc.so.6
> #2  0x0000003be088c497 in __tz_convert () from /lib64/libc.so.6
> #3  0x0000000000418a16 in _log_printf ()
> #4  0x0000000000418cb1 in internal_log_printf2 ()
> #5  0x00002aaaab0b8819 in pcmk_plugin_init () from
> /usr/libexec/lcrso/pacemaker.lcrso
> #6  0x00002aaaab0b946a in pcmk_startup () from
> /usr/libexec/lcrso/pacemaker.lcrso
> #7  0x000000000041a422 in openais_service_link_and_init ()
> #8  0x000000000041a5c8 in openais_service_defaults_link_and_init ()
> #9  0x0000000000418117 in main ()
> (gdb) thread 3
> Thread ID 3 not known.
> (gdb) thread 4
> Thread ID 4 not known.
> (gdb) thread 5
> Thread ID 5 not known.
> (gdb) thread 6
> Thread ID 6 not known.
>
> Like I said, I have not used gdb before, so if I am doing something wrong,
> let me know what I should do or where can I read some docs to try and
> understand what I am supposed to do with it to give you useful output.
>
> Here is the log file where I do not see any valuable crm info up until the
> point where I installed gdb on the system
>
> Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] AIS Executive Service
> RELEASE 'subrev 1152 version 0.80'
> Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] Copyright (C) 2002-2006
> MontaVista Software, Inc and contributors.
> Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] Copyright (C) 2006 Red
> Hat, Inc.
> Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] AIS Executive Service:
> started and ready to provide service.
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Token Timeout (3000 ms)
> retransmit timeout (294 ms)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] token hold (225 ms)
> retransmits before loss (10 retrans)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] join (60 ms) send_join
> (0 ms) consensus (1500 ms) merge (200 ms)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] downcheck (1000 ms) fail
> to recv const (50 msgs)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] seqno unchanged const
> (30 rotations) Maximum network MTU 1500
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] window size per rotation
> (50 messages) maximum messages per rotation (20 messages)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] send threads (0 threads)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP token expired
> timeout (294 ms)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP token problem
> counter (2000 ms)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP threshold (10
> problem count)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP mode set to passive.
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM]
> heartbeat_failures_allowed (0)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] max_network_delay (50
> ms)
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] HeartBeat is Disabled.
> To enable set heartbeat_failures_allowed > 0
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Receive multicast socket
> recv buffer size (262142 bytes).
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Transmit multicast
> socket send buffer size (262142 bytes).
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] The network interface
> [10.0.0.22] is now up.
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Created or loaded
> sequence id 184.10.0.0.22 for this ring.
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Receive multicast socket
> recv buffer size (262142 bytes).
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Transmit multicast
> socket send buffer size (262142 bytes).
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] The network interface
> [10.0.1.22] is now up.
> Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] entering GATHER state
> from 15.
> Aug 21 11:01:32 phys-file02 openais[19423]: [crm  ] info: process_ais_conf:
> Reading configure
> Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: config_find_next:
> Processing additional logging options...
> Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt:
> Found 'on' for option: debug
> Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt:
> Defaulting to 'off' for option: to_file
> Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt:
> Found 'daemon' for option: syslog_facility
> Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: config_find_next:
> Processing additional service options...
> Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt:
> Defaulting to 'no' for option: use_logd
> Aug 21 11:01:58 phys-file02 crm_shadow: [19439]: info: Invoked: crm_shadow
> Aug 21 11:01:58 phys-file02 crm_shadow: [19453]: info: Invoked: crm_shadow
> Aug 21 11:01:58 phys-file02 crm_shadow: [19455]: info: Invoked: crm_shadow
> Aug 21 11:02:01 phys-file02 crm_shadow: [19467]: info: Invoked: crm_shadow
> Aug 21 11:02:01 phys-file02 crm_shadow: [19481]: info: Invoked: crm_shadow
> Aug 21 11:02:01 phys-file02 crm_shadow: [19483]: info: Invoked: crm_shadow
> Aug 21 11:02:03 phys-file02 crm_shadow: [19495]: info: Invoked: crm_shadow
> Aug 21 11:02:03 phys-file02 crm_shadow: [19509]: info: Invoked: crm_shadow
> Aug 21 11:02:03 phys-file02 crm_shadow: [19511]: info: Invoked: crm_shadow
> Aug 21 11:02:16 phys-file02 yum: Installed: gdb-6.8-27.el5.x86_64
>
> Again, killin aisexec and restarting openais seems to work.
>
> [root at phys-file02 ~]# /etc/init.d/openais stop
> Stopping OpenAIS daemon (aisexec):
> ......................................................................................................................................
> [root at phys-file02 ~]# pkill -9 aisexec
> [root at phys-file02 ~]# ps -ef | grep aise
> root     19546 19241  0 11:10 pts/1    00:00:00 grep aise
> [root at phys-file02 ~]# /etc/init.d/openais start
> Starting OpenAIS daemon (aisexec): starting... rc=0: OK
> [root at phys-file02 ~]# crm status
>
>
> ============
> Last updated: Fri Aug 21 11:10:51 2009
> Stack: openais
> Current DC: phys-file01.physics.gatech.edu - partition with quorum
> Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> ============
>
> Online: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ]
>
> Master/Slave Set: ms-drbd_export
>        Masters: [ phys-file01.physics.gatech.edu ]
>        Slaves: [ phys-file02.physics.gatech.edu ]
> Master/Slave Set: ms-drbd_scratch
>        Masters: [ phys-file01.physics.gatech.edu ]
>        Slaves: [ phys-file02.physics.gatech.edu ]
> Resource Group: fileserver
>    fs_export   (ocf::heartbeat:Filesystem):    Started
> phys-file01.physics.gatech.edu
>    fs_scratch  (ocf::heartbeat:Filesystem):    Started
> phys-file01.physics.gatech.edu
>    virtual-ip-1        (ocf::heartbeat:IPaddr2):       Started
> phys-file01.physics.gatech.edu
>    nfs (lsb:nfs):      Started phys-file01.physics.gatech.edu
>    samba       (lsb:smb):      Started phys-file01.physics.gatech.edu
> Clone Set: pingd-clone
>        Started: [ phys-file01.physics.gatech.edu
> phys-file02.physics.gatech.edu ]
> [root at phys-file02 ~]#
>
> Diego
>
> Andrew Beekhof wrote:
>>
>> On Wed, Aug 12, 2009 at 3:35 PM, Diego
>> Remolina<diego.remolina at physics.gatech.edu> wrote:
>>>>
>>>> could you instead attach to it with gdb and see what it was doing?
>>>
>>> I will try, but cannot promise it will be soon, beginning of the semester
>>> is
>>> very busy and I am not familiar with gdb...
>>
>> gdb aisexec $PID_OF_AISEXEC
>> # where
>>
>> then, for every thread it has:
>>
>> # thread 0
>> # where
>> # thread 1
>> # where
>> ...
>>
>> I think you get the idea :-)
>>
>>> RedHat.... one is x86_64, the other is the 32 bit one....
>>>
>>> [root at phys-file01 windows7]# rpm -qa --qf
>>> "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n" | grep openais
>>> openais-0.80.5-13.1.x86_64
>>> libopenais2-0.80.5-13.1.i386
>>> libopenais2-0.80.5-13.1.x86_64
>>
>> how about trying with just one?
>> maybe something is confused.
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>




More information about the Pacemaker mailing list