<div dir="ltr">Hello David,<div><br></div><div>I think I use the latest version from ubuntu, it is version 1.1.10</div><div>Do you think it has bug on it?</div><div>Should I compile from the source?</div><div><br></div><div>Best Regards,</div><div><br></div><div><br></div><div>Ariee</div><div><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 19, 2014 at 8:27 PM,  <span dir="ltr"><<a href="mailto:pacemaker-request@oss.clusterlabs.org" target="_blank">pacemaker-request@oss.clusterlabs.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Message: 2<br>

Date: Fri, 19 Dec 2014 14:21:59 -0500 (EST)<br>

From: David Vossel <<a href="mailto:dvossel@redhat.com">dvossel@redhat.com</a>><br>

To: The Pacemaker cluster resource manager<br>

        <<a href="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</a>><br>

Subject: Re: [Pacemaker] pacemaker error after a couple week or month<br>

Message-ID:<br>

        <<a href="mailto:102420175.739708.1419016919246.JavaMail.zimbra@redhat.com">102420175.739708.1419016919246.JavaMail.zimbra@redhat.com</a>><br>

Content-Type: text/plain; charset=utf-8<br>

<br>

<br>

<br>

----- Original Message -----<br>

> Hello,<br>

><br>

> I have 2 active-passive fail over system with corosync and drbd.<br>

> One system using 2 debian server and the other using 2 ubuntu server.<br>

> The debian servers are for web server fail over and the ubuntu servers are<br>

> for database server fail over.<br>

><br>

> I applied the same configuration in the pacemaker. Everything works fine,<br>

> fail over can be done nicely and also the file system synchronization, but<br>

> in the ubuntu server, it was always has error after a couple week or month.<br>

> The pacemaker in ubuntu1 had different status with ubuntu2, ubuntu1 assumed<br>

> that ubuntu2 was down and ubuntu2 assumed that something happened with<br>

> ubuntu1 but still alive and took over the resources. It made the drbd<br>

> resource cannot be taken over, thus no fail over happened and we must<br>

> manually restart the server because restarting pacemaker and corosync didn't<br>

> help. I have changed the configuration of pacemaker a couple time, but the<br>

> problem still exist.<br>

><br>

> has anyone experienced it? I use Ubuntu 14.04.1 LTS.<br>

><br>

> I got this error in apport.log<br>

><br>

> ERROR: apport (pid 20361) Fri Dec 19 02:43:52 2014: executable:<br>

> /usr/lib/pacemaker/lrmd (command line "/usr/lib/pacemaker/lrmd")<br>

<br>

wow, it looks like the lrmd is crashing on you. I haven't seen this occur<br>

in the wild before. Without a backtrace it will be nearly impossible to determine<br>

what is happening.<br>

<br>

Do you have the ability to upgrade pacemaker to a newer version?<br>

<br>

-- Vossel<br>

</blockquote></div><br></div></div></div>