[Pacemaker] Corosync + Pacemaker New Install: Corosync Fails Without Error Message

Steven Dake sdake at redhat.com
Tue Jun 22 21:19:56 UTC 2010


On 06/18/2010 09:42 AM, Eliot Gable wrote:
> I don’t have an “aisexec” section at all. I simply copied the sample
> file, which did not have one.
>
> I did figure out why it wasn’t logging. It was set to AMF mode and
> ‘mode’ was ‘disabled’ in the AMF configuration section. After changing
> that to ‘enabled’, I now have logging. That allowed me to figure out
> that I needed to set rrp_mode to something other than ‘none’, because I
> have two interfaces to run the totem protocol over. However, with it set
> to ‘passive’ or ‘active’, corosync tries to start, then seg faults:
>
> Jun 18 07:33:23 corosync [MAIN ] Corosync Cluster Engine ('1.2.2'):
> started and ready to provide service.
>
> Jun 18 07:33:23 corosync [MAIN ] Corosync built-in features: nss rdma
>
> Jun 18 07:33:23 corosync [MAIN ] Successfully read main configuration
> file '/etc/corosync/corosync.conf'.
>
> Jun 18 07:33:23 corosync [TOTEM ] Token Timeout (1000 ms) retransmit
> timeout (238 ms)
>
> Jun 18 07:33:23 corosync [TOTEM ] token hold (180 ms) retransmits before
> loss (4 retrans)
>
> Jun 18 07:33:23 corosync [TOTEM ] join (50 ms) send_join (0 ms)
> consensus (1200 ms) merge (200 ms)
>
> Jun 18 07:33:23 corosync [TOTEM ] downcheck (1000 ms) fail to recv const
> (50 msgs)
>
> Jun 18 07:33:23 corosync [TOTEM ] seqno unchanged const (30 rotations)
> Maximum network MTU 1402
>
> Jun 18 07:33:23 corosync [TOTEM ] window size per rotation (50 messages)
> maximum messages per rotation (17 messages)
>
> Jun 18 07:33:23 corosync [TOTEM ] send threads (0 threads)
>
> Jun 18 07:33:23 corosync [TOTEM ] RRP token expired timeout (238 ms)
>
> Jun 18 07:33:23 corosync [TOTEM ] RRP token problem counter (2000 ms)
>
> Jun 18 07:33:23 corosync [TOTEM ] RRP threshold (10 problem count)
>
> Jun 18 07:33:23 corosync [TOTEM ] RRP mode set to passive.
>
> Jun 18 07:33:23 corosync [TOTEM ] heartbeat_failures_allowed (0)
>
> Jun 18 07:33:23 corosync [TOTEM ] max_network_delay (50 ms)
>
> Jun 18 07:33:23 corosync [TOTEM ] HeartBeat is Disabled. To enable set
> heartbeat_failures_allowed > 0
>
> Jun 18 07:33:23 corosync [TOTEM ] Initializing transport (UDP/IP).
>
> Jun 18 07:33:23 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>
> Jun 18 07:33:23 corosync [TOTEM ] Initializing transport (UDP/IP).
>
> Jun 18 07:33:23 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>
> Jun 18 07:33:23 corosync [IPC ] you are using ipc api v2
>
> Jun 18 07:33:23 corosync [TOTEM ] Receive multicast socket recv buffer
> size (262142 bytes).
>
> Jun 18 07:33:23 corosync [TOTEM ] Transmit multicast socket send buffer
> size (262142 bytes).
>
> Jun 18 07:33:23 corosync [TOTEM ] The network interface is down.
>
> Jun 18 07:33:23 corosync [TOTEM ] Created or loaded sequence id
> 0.127.0.0.1 for this ring.
>
> Jun 18 07:33:23 corosync [pcmk ] info: process_ais_conf: Reading configure
>
> Jun 18 07:33:23 corosync [pcmk ] info: config_find_init: Local handle:
> 2013064636357672962 for logging
>
> Jun 18 07:33:23 corosync [pcmk ] info: config_find_next: Processing
> additional logging options...
>
> Jun 18 07:33:23 corosync [pcmk ] info: get_config_opt: Found 'on' for
> option: debug
>
> Jun 18 07:33:23 corosync [pcmk ] info: get_config_opt: Defaulting to
> 'off' for option: to_file
>
> Jun 18 07:33:23 corosync [pcmk ] info: get_config_opt: Found 'yes' for
> option: to_syslog
>
> Jun 18 07:33:23 corosync [pcmk ] info: get_config_opt: Defaulting to
> 'daemon' for option: syslog_facility
>
> Jun 18 07:33:23 corosync [pcmk ] info: config_find_init: Local handle:
> 4730966301143465987 for service
>
> Jun 18 07:33:23 corosync [pcmk ] info: config_find_next: Processing
> additional service options...
>
> Jun 18 07:33:23 corosync [pcmk ] info: get_config_opt: Defaulting to
> 'pcmk' for option: clustername
>
> Jun 18 07:33:23 corosync [pcmk ] info: get_config_opt: Defaulting to
> 'no' for option: use_logd
>
> Jun 18 07:33:23 corosync [pcmk ] info: get_config_opt: Defaulting to
> 'no' for option: use_mgmtd
>
> Jun 18 07:33:23 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
>
> Jun 18 07:33:23 corosync [pcmk ] Logging: Initialized pcmk_startup
>
> Jun 18 07:33:23 corosync [pcmk ] info: pcmk_startup: Maximum core file
> size is: 18446744073709551615
>
> Segmentation fault
>
> (gdb) where full
>
> #0 0x000000332de797c0 in strlen () from /lib64/libc.so.6
>
> No symbol table info available.
>
> #1 0x00002aaaaacefb9b in logsys_worker_thread (data=<value optimized
> out>) at logsys.c:760
>
> rec = 0x2aaaaaef0c28
>
> dropped = 0
>
> #2 0x000000332e60673d in start_thread () from /lib64/libpthread.so.0
>
> No symbol table info available.
>
> #3 0x000000332ded3d1d in clone () from /lib64/libc.so.6
>
> No symbol table info available.
>
> (gdb)
>
> Downgrading again back to 1.2.1-1.el5 seems to resolve the issue, and
> Corosync runs.
>
> Eliot Gable
> Senior Product Developer
> 1228 Euclid Ave, Suite 390
> Cleveland, OH 44115
>
> Direct: 216-373-4808
> Fax: 216-373-4657
> egable at broadvox.net <mailto:egable at broadvox.net>
>
> cid:212454920 at 11022008-1E22
>
> CONFIDENTIAL COMMUNICATION. This e-mail and any files transmitted with
> it are confidential and are intended solely for the use of the
> individual or entity to whom it is addressed. If you are not the
> intended recipient, please call me immediately. BROADVOX is a registered
> trademark of Broadvox, LLC.
>
> *From:* Gianluca Cecchi [mailto:gianluca.cecchi at gmail.com]
> *Sent:* Friday, June 18, 2010 11:35 AM
> *To:* The Pacemaker cluster resource manager
> *Subject:* Re: [Pacemaker] Corosync + Pacemaker New Install: Corosync
> Fails Without Error Message
>
> On Fri, Jun 18, 2010 at 5:25 PM, Eliot Gable <egable at broadvox.com
> <mailto:egable at broadvox.com>> wrote:
>
> I am trying to set up Corosync + Pacemaker on a new CentOS 5.5 x86_64
> install, but when I try to start corosync, it just says [FAILED] and
> does not provide any further information. I created the authkey using
> corosync-keygen and created a corosync.conf file. The log file remains
> empty and no errors are displayed on the console when it fails to start.
> I tried downgrading to 1.2.1-1.el5, but that did not resolve the issue
> either. So I have re-upgraded back to 1.2.2-1.1.el5.
>
> What are the contents of your /etc/corosync/corosync.conf for the
> logging section and for the aisexec section?
>
> do you have for example something like this:
> aisexec {
> user: root
> group: root
> }
>
> when you say "log file" you mean the one indicated in
> /etc/corosync/corosync.conf or /var/log/messages or both?
>
> Gianluca
>
>
> ------------------------------------------------------------------------
> CONFIDENTIAL. This e-mail and any attached files are confidential and
> should be destroyed and/or returned if you are not the intended and
> proper recipient.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

This is a known issue.  1.2.5 resolves these issues and others.  Andrew 
is/has built an update for the clusterlabs repo.

Regards
-steve




More information about the Pacemaker mailing list