Hi,
<br>
<br>I've got a pair of servers running on RHEL5 x86_64 with openais-0.80
(older install) which I want to upgrade to corosync-1.3.0 +
pacemaker-1.0.10. Downtime is not an issue and corosync 1.3.0 is needed
for UDPU, so I built it from the <a href="http://corosync.org">corosync.org</a> website and openais 1.1.4
from <a href="http://openais.org">openais.org</a> website.
<br>
<br>With pacemaker, we won't be using the heartbeat stack, so I built the
pacemaker package from the <a href="http://clusterlabs.org">clusterlabs.org</a> src.rpm without heartbeat
support. To be more precise I used
<br>
<br>rpmbuild --without heartbeat --with ais --with snmp --with esmtp -ba
pacemaker-epel.spec
<br>
<br>Now I've tested the rpm list below on a pair of XEN VM's, it works just
fine.
<br>
<br>cluster-glue-1.0.6-1.6.el5.x86_64.rpm
<br>cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
<br>corosync-1.3.0-1.x86_64.rpm
<br>corosynclib-1.3.0-1.x86_64.rpm
<br>libesmtp-1.0.4-5.el5.x86_64.rpm
<br>libibverbs-1.1.2-1.el5.x86_64.rpm
<br>librdmacm-1.0.8-1.el5.x86_64.rpm
<br>libtool-ltdl-1.5.22-6.1.x86_64.rpm
<br>openais-1.1.4-2.x86_64.rpm
<br>openaislib-1.1.4-2.x86_64.rpm
<br>openhpi-2.10.2-1.el5.x86_64.rpm
<br>openib-1.3.2-0.20080728.0355.3.el5.noarch.rpm
<br>pacemaker-1.0.10-1.4.x86_64.rpm
<br>pacemaker-libs-1.0.10-1.4.x86_64.rpm
<br>perl-TimeDate-1.16-5.el5.noarch.rpm
<br>resource-agents-1.0.3-2.6.el5.x86_64.rpm
<br>
<br>However when performing the upgrade on the servers running openais-0.80,
first I removed the heartbeat, heartbeat-libs and PyXML rpms
(conflicting dependencies issue) then rpm -Uvh on the rpm list above.
Installation went fine, removed existing cib.xml and signatures, fresh
start. Then I configured corosync, then started it on both servers, and
nothing. At first I got an error related to pacemaker mgmt, which was an
old package installed with the old rpms. Removed it, tried again.
Nothing. Removed all cluster related rpms old and new + deps, except for
DRBD, then installed the list above, then again, nothing. What nothing
means:
<br>- corosync starts, never elects DC, never sees the other node or itself
for that matter.
<br>- can stop corosync via the init script, it goes into an endless phase
where it just prints dots to the screen, have to kill the process to
make it stop.
<br>
<br>Troubleshooting done so far:
<br>- tested network sockets (nc from side to side), firewall rules
(iptables down), communication is ok
<br>- searched for the original RPM's list, removed all remaining RPMs, ran
ldconfig, removed new RPM's, installed new RPM's
<br>
<br>My guess is that there are some leftovers from the old openais-0.80
installation, which mess with the current installation, seeing as how
the same set of RPMs on a pair of XEN VM's with the same OS work fine,
however I cannot put my finger on the culprit for the real servers' issue. <div><br></div><div>Logs: <meta http-equiv="content-type" content="text/html; charset=utf-8"><a href="http://pastebin.com/i0maZM4p">http://pastebin.com/i0maZM4p</a><br>
<br>Ideas, suggestions?
<br>
<br>TIA.
<br>
<br>Regards,
<br>Dan <br clear="all"><br>-- <br>Dan Frîncu<br>CCNA, RHCE<br><br>
</div>