<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Another problem has appeared:</div><div>after the reboot of one server I often have a cluster partition and both servers elect themselves DC.</div><div><br></div><div>Even if the partition doesn't appear just after the reboot of one server (i.e. serverA), if I try to restart corosync on the other server (i.e. serverB), the partition appear.</div><div>Then if I also restart corosync on the first server (serverA) everything work fine again.</div><div>But if I restart corosync on the second server (serverB) nothing change and the partition appears again.</div><div><br></div><div>It's seems to me that there is still something wrong with the first run of corosync just after the server reboot.</div><div><br></div><div>I didn't configure any fencing method, because I think that my configuration is really simple and I don't need it.</div><div><br></div><div>Thanks again for your patience,</div><div>Giovanni</div><div><br></div><br><div><div>On Nov 17, 2009, at 12:07 PM, Giovanni Di Milia wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Disabling syslog the problem disappears.</div><div><br></div><div>Thank you very much,</div><div>Giovanni</div><div><br></div><div><br></div><div><br></div><div><div>On Nov 16, 2009, at 4:51 PM, hj lee wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi,<br><br>Please disable syslog in openais.conf, and try it again. It seems this issue is related to fork() call and syslog().<br><br>hj<br><br><div class="gmail_quote">On Fri, Nov 13, 2009 at 1:08 PM, Giovanni Di Milia <span dir="ltr"><<a href="mailto:gdimilia@cfa.harvard.edu">gdimilia@cfa.harvard.edu</a>></span> wrote:<br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div style=""><div>Thank you very much for your response.</div><div><br></div><div>The only thing I really don't understand is: why this problem doesn't appear in all my simulations?</div> <div>I configured at least 7 couple of virtual servers with vmware 2 and CentOS 5.3 and 5.4 (32 and 64 bits) and I never had this kind of problems!</div><div><br></div><div>The only difference in the configuration is that I used private IPs for the simulations and public IPs for the real servers, but I don't think it is important.</div> <div><br></div><div>Thanks for your patience,</div><div>Giovanni</div><div><div></div><div class="h5"><div><br></div><div><br></div><br><div><div>On Nov 13, 2009, at 1:36 PM, hj lee wrote:</div><br><blockquote type="cite"> Hi,<br><br>I have the same problem in CentOS 5.3 with pacemaker-1.0.5 and openais-0.80.5. This is openais bug! Two problems.<br>1. Starting openais service gets seg fault sometime. It more likely happens if openais service get started before syslog.<br> 2. The seg fault handler of openais calls syslog(). The syslog is one of UNSAFE function that must not be called from signal handler because it is non-reentrent function.<br><br>To fix this issue: get the openais source, find sigsegv_handler function exec/main.c and just comment out log_flush(), shown below. Then recompile and isntall it(make and make install). The log_flush should be removed from all signal handlers in openais code base. I am still not sure where seg fault occurs, but commenting out log_flush prevents seg fault.<br> <br><br>-------------------------------------------------------------------------<br>static void sigsegv_handler (int num)<br>{<br> signal (SIGSEGV, SIG_DFL);<br>// log_flush ();<br> raise (SIGSEGV);<br> }<br><br>Thanks<br>hj<br><br><div class="gmail_quote">On Thu, Nov 12, 2009 at 4:21 PM, Giovanni Di Milia <span dir="ltr"><<a href="mailto:gdimilia@cfa.harvard.edu" target="_blank">gdimilia@cfa.harvard.edu</a>></span> wrote:<br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> I set up a cluster of two servers CentOS 5.4 x86_64 with pacemaker 1.06 and corosync 1.1.2<br> <br> I only installed the x86_64 packages (yum install pacemaker try to install also the 32 bits one).<br> <br> I configured a shared cluster IP (it's a public ip) and a cluster website.<br> <br> Everything work fine if i try to stop corosync on one of the two servers (the services pass from one machine to the other without problems), but if I reboot one server, when it returns alive it cannot go online in the cluster.<br> I also noticed that there are several thread of corosync and if I kill all of them and then I start again corosync, everything work fine again.<br> <br> I don't know what is happening and I'm not able to reproduce the same situation on some virtual servers!<br> <br> Thanks,<br> Giovanni<br> <br> <br> <br> the configuration of corosync is the following:<br> <br> ##############################################<br> # Please read the corosync.conf.5 manual page<br> compatibility: whitetank<br> <br> aisexec {<br> # Run as root - this is necessary to be able to manage resources with Pacemaker<br> user: root<br> group: root<br> }<br> <br> service {<br> # Load the Pacemaker Cluster Resource Manager<br> ver: 0<br> name: pacemaker<br> use_mgmtd: yes<br> use_logd: yes<br> }<br> <br> totem {<br> version: 2<br> <br> # How long before declaring a token lost (ms)<br> token: 5000<br> <br> # How many token retransmits before forming a new configuration<br> token_retransmits_before_loss_const: 10<br> <br> # How long to wait for join messages in the membership protocol (ms)<br> join: 1000<br> <br> # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)<br> consensus: 2500<br> <br> # Turn off the virtual synchrony filter<br> vsftype: none<br> <br> # Number of messages that may be sent by one processor on receipt of the token<br> max_messages: 20<br> <br> # Stagger sending the node join messages by 1..send_join ms<br> send_join: 45<br> <br> # Limit generated nodeids to 31-bits (positive signed integers)<br> clear_node_high_bit: yes<br> <br> # Disable encryption<br> secauth: off<br> <br> # How many threads to use for encryption/decryption<br> threads: 0<br> <br> # Optionally assign a fixed node id (integer)<br> # nodeid: 1234<br> <br> interface {<br> ringnumber: 0<br> <br> # The following values need to be set based on your environment<br> bindnetaddr: XXX.XXX.XXX.0 #here I put the right ip for my configuration<br> mcastaddr: 226.94.1.1<br> mcastport: 4000<br> }<br> }<br> <br> logging {<br> fileline: off<br> to_stderr: yes<br> to_logfile: yes<br> to_syslog: yes<br> logfile: /tmp/corosync.log<br> debug: off<br> timestamp: on<br> logger_subsys {<br> subsys: AMF<br> debug: off<br> }<br> }<br> <br> amf {<br> mode: disabled<br> }<br> <br> ##################################################<br> <br> <br> <br> _______________________________________________<br> Pacemaker mailing list<br> <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br> </blockquote></div><br><br clear="all"><br>-- <br>Dream with longterm vision!<br> kerdosa<br> _______________________________________________<br>Pacemaker mailing list<br><a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br><a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br> </blockquote></div><br></div></div></div><br>_______________________________________________<br> Pacemaker mailing list<br> <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br> <br></blockquote></div><br><br clear="all"><br>-- <br>Dream with longterm vision!<br>kerdosa<br> _______________________________________________<br>Pacemaker mailing list<br><a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br><a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br></blockquote></div><br></div>_______________________________________________<br>Pacemaker mailing list<br><a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br></blockquote></div><br></body></html>