[Pacemaker] Nodes will not promote DRBD resources to master on failover

Wed Jan 25 14:58:44 EST 2012

Hello, 

Recently I finished configuring a two-node cluster with pacemaker 1.1.6 and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster includes the following resources: 
- primitives for DRBD storage devices 
- primitives for mounting the filesystem on the DRBD storage 
- primitives for some mount binds 
- primitive for starting apache 
- primitives for starting samba and nfs servers (following instructions here ) 
- primitives for exporting nfs shares (ocf:heartbeat:exportfs) 

Perhaps this is best described through the output of crm_mon: 

Online: [ node1 node2 ] 

Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged) 
p_drbd_mount1:0 (ocf::linbit:drbd): Started node2 (unmanaged) 
p_drbd_mount1:1 (ocf::linbit:drbd): Started node1 (unmanaged) FAILED 
Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
p_drbd_mount2:0 (ocf::linbit:drbd): Master node1 (unmanaged) FAILED 
Slaves: [ node2 ] 
Resource Group: g_core 
p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1 
p_fs_mount2 (ocf::heartbeat:Filesystem): Started node1 
p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1 
Resource Group: g_apache 
p_fs_mountbind1 (ocf::heartbeat:Filesystem): Started node1 
p_fs_mountbind2 (ocf::heartbeat:Filesystem): Started node1 
p_fs_mountbind3 (ocf::heartbeat:Filesystem): Started node1 
p_fs_varwww (ocf::heartbeat:Filesystem): Started node1 
p_apache (ocf::heartbeat:apache): Started node1 
Resource Group: g_fileservers 
p_lsb_smb (lsb:smbd): Started node1 
p_lsb_nmb (lsb:nmbd): Started node1 
p_lsb_nfsserver (lsb:nfs-kernel-server): Started node1 
p_exportfs_mount1 (ocf::heartbeat:exportfs): Started node1 
p_exportfs_mount2 (ocf::heartbeat:exportfs): Started node1 

I have read through the Pacemaker Explained documentation, however could not find a way to further debug these problems. First, I put node1 into standby mode to attempt failover to the other node (node2). Node2 appeared to start the transition to master, however it failed to promote the DRBD resources to master (the first step). I have attached a copy of this session in commands.log and additional excerpts from /var/log/syslog during important steps. I have attempted everything I can think of to try and start the DRBD resource (e.g. start/stop/promote/manage/cleanup under crm resource , restarting heartbeat) but cannot bring it out of the slave state. However, if I set it to unmanaged and then run drbdadm primary all in the terminal, pacemaker is satisfied and continues starting the rest of the resources. It then failed when attempting to mount the filesystem for mount2, the p_fs_mount2 resource. I attempted to mount the filesystem myself and was successful. I then unmounted it and ran cleanup on p_fs_mount2 and then it mounted. The rest of the resources started as expected until the p_exportfs_mount2 resource, which failed as follows: 
p_exportfs_mount2 (ocf::heartbeat:exportfs): started node2 (unmanaged) FAILED 

I ran cleanup on this and it started, however when running this test earlier today no command could successfully start this exportfs resource. 

How can I configure pacemaker to better resolve these problems and be able to bring the node up successfully on its own? What can I check to determine why these failures are occuring? /var/log/syslog did not seem to contain very much useful information regarding why the failures occurred. 

Thanks, 

Andrew 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120125/b8efb354/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: commands.log
Type: text/x-log
Size: 3612 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120125/b8efb354/attachment-0008.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cleanup-ms_drbd_mount1.log
Type: text/x-log
Size: 28913 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120125/b8efb354/attachment-0009.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cleanup-p_fs_mount2.log
Type: text/x-log
Size: 37256 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120125/b8efb354/attachment-0010.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cleanup-p_exportfs_mount2.log
Type: text/x-log
Size: 19508 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120125/b8efb354/attachment-0011.bin>