[Pacemaker] Nagios3 on shared device / howto from clusterlabs wiki

Thomas ThomasCaspari at t-online.de
Thu Apr 28 08:49:45 EDT 2011


I've just come across the howto 
and found some serious problems. I am experimenting with my first cluster
configuration, which should end in a stable and reliable production environment
on a new blade server and an older existing machine. My Linux is Debian squeeze.
I installed I am using active/active configuration and OCFS2 filesystem.

Problems I found:
- the current lsb-startscript (version 3.2.1-2 currently) in debian is faulty.
In addition to the pid-file fault mentionned in the wiki, it contains 2 'status
()'-sections. Looks like an overlooked modification made for debugging
something. I have corrected that manally. Unfortunately I cannot post the patch
here because the mailing list thinks I am top-posting.

- on a shared device the ownership is determined by uid/gid. nagios itself needs
user:group nagios:nagios on /var/lib/nagios3/retention.dat and
/var/lib/nagios3/spool/ - so it must be ensured that the uid/gid on all nodes
running nagios3 are the same. Otherwise nagios3 will not run on at least one
node! I have solved that problem by creating uid:gid nagios:nagios identically
on all nodes before actually doing apt-get install nagios3. This works.
- when starting corosync, before starting any resource it also checks all
resources if they are stopped correctly. '/etc/init.d/nagios3 status' fails
because it cannot find it's config file '/etc/nagios3/nagios.cfg' which is on
the shared device. This cannot be prevented by an order constraint, which I
believe is correct.

I am currently thinking about how to most elegantly solve this problem. I see
the following methods
- patching lsb script. I think this is no good, as config check 
- creating a /mnt_shared/etc/nagios3/nagios.cfg on each node before mounting
filesys. Advantage: file only has to be correct, it is only read when nagios3 is
not started, so we don't have to sync. Disadvantage: this can easily be
forgotten as you usually don't see this file.
- linking the files in /etc/nagios3/ individually and leaving out nagios.cfg,
which then can be synced via csync2 or similar. That could be done by a script.

All of them are not easy. I am thinking of dropping the shared config of nagios,
but I need the shared device anyway for apache. 

Has anyone ideas of elegantly solving these problems?

More information about the Pacemaker mailing list