Hi,<br><br>I have the same problem in CentOS 5.3 with pacemaker-1.0.5 and openais-0.80.5. This is openais bug! Two problems.<br>1. Starting openais service gets seg fault sometime. It more likely happens if openais service get started before syslog.<br>

2. The seg fault handler of openais calls syslog(). The syslog is one of UNSAFE function that must not be called from signal handler because it is non-reentrent function.<br><br>To fix this issue: get the openais source, find sigsegv_handler function exec/main.c and just comment out log_flush(), shown below. Then recompile and isntall it(make and make install). The log_flush should be removed from all signal handlers in openais code base. I am still not sure where seg fault occurs, but commenting out log_flush prevents seg fault.<br>

<br><br>-------------------------------------------------------------------------<br>static void sigsegv_handler (int num)<br>{<br>        signal (SIGSEGV, SIG_DFL);<br>//      log_flush ();<br>        raise (SIGSEGV);<br>

}<br><br>Thanks<br>hj<br><br><div class="gmail_quote">On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia <span dir="ltr"><<a href="mailto:gdimilia@cfa.harvard.edu">gdimilia@cfa.harvard.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker 1.06 and corosync 1.1.2<br>

<br>

I only installed the x86_64 packages (yum install pacemaker try to install also the 32 bits one).<br>

<br>

I configured a shared cluster IP (it's a public ip) and a cluster website.<br>

<br>

Everything work fine if i try to stop corosync on one of the two servers (the services pass from one machine to the other without problems), but if I reboot one server, when it returns alive it cannot go online in the cluster.<br>


I also noticed that there are several thread of corosync and if I kill all of them and then I start again corosync, everything work fine again.<br>

<br>

I don't know what is happening and I'm not able to reproduce the same situation on some virtual servers!<br>

<br>

Thanks,<br>

Giovanni<br>

<br>

<br>

<br>

the configuration of corosync is the following:<br>

<br>

##############################################<br>

# Please read the corosync.conf.5 manual page<br>

compatibility: whitetank<br>

<br>

aisexec {<br>

        # Run as root - this is necessary to be able to manage resources with Pacemaker<br>

        user:   root<br>

        group:  root<br>

}<br>

<br>

service {<br>

        # Load the Pacemaker Cluster Resource Manager<br>

        ver:       0<br>

        name:      pacemaker<br>

        use_mgmtd: yes<br>

        use_logd:  yes<br>

}<br>

<br>

totem {<br>

        version: 2<br>

<br>

        # How long before declaring a token lost (ms)<br>

        token:          5000<br>

<br>

        # How many token retransmits before forming a new configuration<br>

        token_retransmits_before_loss_const: 10<br>

<br>

        # How long to wait for join messages in the membership protocol (ms)<br>

        join:           1000<br>

<br>

        # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)<br>

        consensus:      2500<br>

<br>

        # Turn off the virtual synchrony filter<br>

        vsftype:        none<br>

<br>

        # Number of messages that may be sent by one processor on receipt of the token<br>

        max_messages:   20<br>

<br>

        # Stagger sending the node join messages by 1..send_join ms<br>

        send_join: 45<br>

<br>

        # Limit generated nodeids to 31-bits (positive signed integers)<br>

        clear_node_high_bit: yes<br>

<br>

        # Disable encryption<br>

        secauth:        off<br>

<br>

        # How many threads to use for encryption/decryption<br>

        threads:        0<br>

<br>

        # Optionally assign a fixed node id (integer)<br>

        # nodeid:         1234<br>

<br>

        interface {<br>

                ringnumber: 0<br>

<br>

                # The following values need to be set based on your environment<br>

bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my configuration<br>

mcastaddr: 226.94.1.1<br>

mcastport: 4000<br>

        }<br>

}<br>

<br>

logging {<br>

        fileline: off<br>

        to_stderr: yes<br>

        to_logfile: yes<br>

        to_syslog: yes<br>

        logfile: /tmp/corosync.log<br>

        debug: off<br>

        timestamp: on<br>

        logger_subsys {<br>

                subsys: AMF<br>

                debug: off<br>

        }<br>

}<br>

<br>

amf {<br>

        mode: disabled<br>

}<br>

<br>

##################################################<br>

<br>

<br>

<br>

_______________________________________________<br>

Pacemaker mailing list<br>

<a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

</blockquote></div><br><br clear="all"><br>-- <br>Dream with longterm vision!<br>kerdosa<br>