[Pacemaker] Nodes will not promote DRBD resources to master on failover

Wed Feb 1 17:19:25 EST 2012

On 01/25/2012 08:58 PM, Andrew Martin wrote:
> Hello,
> 
> Recently I finished configuring a two-node cluster with pacemaker 1.1.6
> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster includes
> the following resources:
> - primitives for DRBD storage devices
> - primitives for mounting the filesystem on the DRBD storage
> - primitives for some mount binds
> - primitive for starting apache
> - primitives for starting samba and nfs servers (following instructions
> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>)
> - primitives for exporting nfs shares (ocf:heartbeat:exportfs)

not enough information ... please share at least your complete cluster
configuration

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Perhaps this is best described through the output of crm_mon:
> Online: [ node1 node2 ]
> 
>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged)
>      p_drbd_mount1:0     (ocf::linbit:drbd):     Started node2 (unmanaged)
>      p_drbd_mount1:1     (ocf::linbit:drbd):     Started node1
> (unmanaged) FAILED
>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>      p_drbd_mount2:0       (ocf::linbit:drbd):     Master node1
> (unmanaged) FAILED
>      Slaves: [ node2 ]
>  Resource Group: g_core
>      p_fs_mount1 (ocf::heartbeat:Filesystem):    Started node1
>      p_fs_mount2   (ocf::heartbeat:Filesystem):    Started node1
>      p_ip_nfs   (ocf::heartbeat:IPaddr2):       Started node1
>  Resource Group: g_apache
>      p_fs_mountbind1    (ocf::heartbeat:Filesystem):    Started node1
>      p_fs_mountbind2    (ocf::heartbeat:Filesystem):    Started node1
>      p_fs_mountbind3    (ocf::heartbeat:Filesystem):    Started node1
>      p_fs_varwww        (ocf::heartbeat:Filesystem):    Started node1
>      p_apache   (ocf::heartbeat:apache):        Started node1
>  Resource Group: g_fileservers
>      p_lsb_smb  (lsb:smbd):     Started node1
>      p_lsb_nmb  (lsb:nmbd):     Started node1
>      p_lsb_nfsserver    (lsb:nfs-kernel-server):        Started node1
>      p_exportfs_mount1   (ocf::heartbeat:exportfs):      Started node1
>      p_exportfs_mount2     (ocf::heartbeat:exportfs):      Started node1
> 
> I have read through the Pacemaker Explained
> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained>
> documentation, however could not find a way to further debug these
> problems. First, I put node1 into standby mode to attempt failover to
> the other node (node2). Node2 appeared to start the transition to
> master, however it failed to promote the DRBD resources to master (the
> first step). I have attached a copy of this session in commands.log and
> additional excerpts from /var/log/syslog during important steps. I have
> attempted everything I can think of to try and start the DRBD resource
> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting
> heartbeat) but cannot bring it out of the slave state. However, if I set
> it to unmanaged and then run drbdadm primary all in the terminal,
> pacemaker is satisfied and continues starting the rest of the resources.
> It then failed when attempting to mount the filesystem for mount2, the
> p_fs_mount2 resource. I attempted to mount the filesystem myself and was
> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then
> it mounted. The rest of the resources started as expected until the
> p_exportfs_mount2 resource, which failed as follows:
> p_exportfs_mount2     (ocf::heartbeat:exportfs):      started node2
> (unmanaged) FAILED
> 
> I ran cleanup on this and it started, however when running this test
> earlier today no command could successfully start this exportfs resource. 
> 
> How can I configure pacemaker to better resolve these problems and be
> able to bring the node up successfully on its own? What can I check to
> determine why these failures are occuring? /var/log/syslog did not seem
> to contain very much useful information regarding why the failures occurred.
> 
> Thanks,
> 
> Andrew
> 
> 
> 
> 
> This body part will be downloaded on demand.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120201/4456438e/attachment-0002.sig>