[Pacemaker] Upgrade from openais-0.80 failed

Dan Frincu frincu.dan at gmail.com
Wed Jan 26 04:06:32 EST 2011


Hi,

I've got a pair of servers running on RHEL5 x86_64 with openais-0.80 (older
install) which I want to upgrade to corosync-1.3.0 + pacemaker-1.0.10.
Downtime is not an issue and corosync 1.3.0 is needed for UDPU, so I built
it from the corosync.org website and openais 1.1.4 from openais.org website.


With pacemaker, we won't be using the heartbeat stack, so I built the
pacemaker package from the clusterlabs.org src.rpm without heartbeat
support. To be more precise I used

rpmbuild --without heartbeat --with ais --with snmp --with esmtp -ba
pacemaker-epel.spec

Now I've tested the rpm list below on a pair of XEN VM's, it works just
fine.

cluster-glue-1.0.6-1.6.el5.x86_64.rpm
cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
corosync-1.3.0-1.x86_64.rpm
corosynclib-1.3.0-1.x86_64.rpm
libesmtp-1.0.4-5.el5.x86_64.rpm
libibverbs-1.1.2-1.el5.x86_64.rpm
librdmacm-1.0.8-1.el5.x86_64.rpm
libtool-ltdl-1.5.22-6.1.x86_64.rpm
openais-1.1.4-2.x86_64.rpm
openaislib-1.1.4-2.x86_64.rpm
openhpi-2.10.2-1.el5.x86_64.rpm
openib-1.3.2-0.20080728.0355.3.el5.noarch.rpm
pacemaker-1.0.10-1.4.x86_64.rpm
pacemaker-libs-1.0.10-1.4.x86_64.rpm
perl-TimeDate-1.16-5.el5.noarch.rpm
resource-agents-1.0.3-2.6.el5.x86_64.rpm

However when performing the upgrade on the servers running openais-0.80,
first I removed the heartbeat, heartbeat-libs and PyXML rpms (conflicting
dependencies issue) then rpm -Uvh on the rpm list above. Installation went
fine, removed existing cib.xml and signatures, fresh start. Then I
configured corosync, then started it on both servers, and nothing. At first
I got an error related to pacemaker mgmt, which was an old package installed
with the old rpms. Removed it, tried again. Nothing. Removed all cluster
related rpms old and new + deps, except for DRBD, then installed the list
above, then again, nothing. What nothing means:
- corosync starts, never elects DC, never sees the other node or itself for
that matter.
- can stop corosync via the init script, it goes into an endless phase where
it just prints dots to the screen, have to kill the process to make it stop.


Troubleshooting done so far:
- tested network sockets (nc from side to side), firewall rules (iptables
down), communication is ok
- searched for the original RPM's list, removed all remaining RPMs, ran
ldconfig, removed new RPM's, installed new RPM's

My guess is that there are some leftovers from the old openais-0.80
installation, which mess with the current installation, seeing as how the
same set of RPMs on a pair of XEN VM's with the same OS work fine, however I
cannot put my finger on the culprit for the real servers' issue.

Logs: http://pastebin.com/i0maZM4p

Ideas, suggestions?

TIA.

Regards,
Dan

-- 
Dan Frîncu
CCNA, RHCE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110126/c0a5cddf/attachment.html>


More information about the Pacemaker mailing list