[Pacemaker] Problem with failover/failback under Ubuntu 10.04 for Active/Passive OpenNMS

Dan Frincu dfrincu at streamwide.ro
Mon Jul 5 11:34:24 EDT 2010

>/ Hi,
/>/ First you might want to look at the following error, see if the module 
/>/ is available on both servers.
/>/ (fs-opennms-config:start:stderr) FATAL: Module scsi_hostadapter not
/>/ Then try to run the resource manually:
/>/ - go to /usr/lib/ocf/resource.d/heartbeat
/>/ - export OCF_ROOT=/usr/lib/ocf
/>/ - export OCF_RESKEY_device="/dev/drbd/by-res/config"
/>/ - export OCF_RESKEY_options=rw
/>/ - export OCF_RESKEY_fstype=xfs
/>/ - export OCF_RESKEY_directory="/etc/opennms"
/>/ - ./Filesystem start
/>/ See if you encounter any errors here. Run the steps on both servers. 
/>/ Make sure to move the drbd resource from server to server so that the 
/>/ mount works. You do that via
/>/ - go to server where drbd device is currently mounted and in a primary
/>/ state
/>/ - umount /etc/opennms
/>/ - drbdadm secondary config
/>/ - move to other server
/>/ - drbdadm primary config
/>/ Also, make sure that pacemaker doesn't interfere with these operations
/>/ Cheers.
I get the error message about the scsi_hostadapter on both nodes
but I can mount the DRBD Device just fine.


>/  monitoring-node-01 lrmd: [994]: info: RA output:
/>/ (fs-opennms-config:start:stderr) /dev/drbd/by-res/config: Wrong medium
/>/ type
/>/  monitoring-node-01 lrmd: [994]: info: RA output:
/>/ (fs-opennms-config:start:stderr) mount: block device /dev/drbd0 is
/>/ write-protected, mounting read-only
/>/  monitoring-node-01 lrmd: [994]: info: RA output:
/>/ (fs-opennms-config:start:stderr) mount: Wrong medium type
/>/  monitoring-node-01 Filesystem[2464]: ERROR: Couldn't mount filesystem
/>/ /dev/drbd/by-res/config on /etc/opennms

/The errors from the log file are DRBD specific, they occur when you're trying to mount a resource in a Secondary state. 
Increase the "op start interval" for both the DRBD and Filesystem primitives to ~15 seconds. Having configured a start 
interval of 0 (zero) seconds, the change of DRBD resource from Primary to Secondary on node2 and then promotion to 
Primary on node1 is not instantaneous, therefore Pacemaker attempts to mount the filesystem without having the DRBD 
resource in a Primary state, it goes into that huuuge 300 second timeout, but as it waits for one resource (DRBD) to 
timeout, it executes the next one, which is the mount, which fails, with the given errors, for the aforementioned reasons.

I'd also suggest adding an "op monitor" for each resource, with a reasonable interval and timeout, and also a mail alert.


Systems Engineer

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100705/ff25b7ae/attachment-0001.html>

More information about the Pacemaker mailing list