[Pacemaker] Nodes will not promote DRBD resources to master on failover

Andrew Martin amartin at xes-inc.com
Tue Mar 27 18:13:11 EDT 2012


Hi Andreas, 


Thanks, I've updated the colocation rule to be in the correct order. I also enabled the STONITH resource (this was temporarily disabled before for some additional testing). DRBD has its own network connection over the br1 interface (192.168.5.0/24 network), a direct crossover cable between node1 and node2: 

global { usage-count no; } 
common { 
syncer { rate 110M; } 
} 
resource vmstore { 
protocol C; 
startup { 
wfc-timeout 15; 
degr-wfc-timeout 60; 
} 
handlers { 
#fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; 
fence-peer "/usr/local/bin/fence-peer"; 
split-brain "/usr/lib/drbd/notify-split-brain.sh me at example.com"; 
} 
net { 
after-sb-0pri discard-zero-changes; 
after-sb-1pri discard-secondary; 
after-sb-2pri disconnect; 
cram-hmac-alg md5; 
shared-secret "xxxxx"; 
} 
disk { 
fencing resource-only; 
} 
on node1 { 
device /dev/drbd0; 
disk /dev/sdb1; 
address 192.168.5.10:7787; 
meta-disk internal; 
} 
on node2 { 
device /dev/drbd0; 
disk /dev/sdf1; 
address 192.168.5.11:7787; 
meta-disk internal; 
} 
} 
# and similar for mount1 and mount2 


Also, here is my ha.cf. It uses both the direct link between the nodes (br1) and the shared LAN network on br0 for communicating: 

autojoin none 
mcast br0 239.0.0.43 694 1 0 
bcast br1 
warntime 5 
deadtime 15 
initdead 60 
keepalive 2 
node node1 
node node2 
node quorumnode 
crm respawn 
respawn hacluster /usr/lib/heartbeat/dopd 
apiauth dopd gid=haclient uid=hacluster 


I am thinking of making the following changes to the CIB (as per the official DRBD guide http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html ) in order to add the DRBD lsb service and require that it start before the ocf:linbit:drbd resources. Does this look correct? 
primitive p_drbd-init lsb:drbd op monitor interval="30" 
colocation c_drbd_together inf: p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master ms_drbd_mount2:Master 
order drbd_init_first inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start 


This doesn't seem to require that drbd be also running on the node where the ocf:linbit:drbd resources are slave (which it would need to do to be a DRBD SyncTarget) - how can I ensure that drbd is running everywhere? (clone cl_drbd p_drbd-init ?) 


Thanks, 


Andrew ----- Original Message -----

From: "Andreas Kurz" <andreas at hastexo.com> 
To: pacemaker at oss.clusterlabs.org 
Sent: Monday, March 26, 2012 5:56:22 PM 
Subject: Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover 

On 03/24/2012 08:15 PM, Andrew Martin wrote: 
> Hi Andreas, 
> 
> My complete cluster configuration is as follows: 
> ============ 
> Last updated: Sat Mar 24 13:51:55 2012 
> Last change: Sat Mar 24 13:41:55 2012 
> Stack: Heartbeat 
> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition 
> with quorum 
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 
> 3 Nodes configured, unknown expected votes 
> 19 Resources configured. 
> ============ 
> 
> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby) 
> Online: [ node2 node1 ] 
> 
> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] 
> Masters: [ node2 ] 
> Slaves: [ node1 ] 
> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] 
> Masters: [ node2 ] 
> Slaves: [ node1 ] 
> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
> Masters: [ node2 ] 
> Slaves: [ node1 ] 
> Resource Group: g_vm 
> p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2 
> p_vm(ocf::heartbeat:VirtualDomain):Started node2 
> Clone Set: cl_daemons [g_daemons] 
> Started: [ node2 node1 ] 
> Stopped: [ g_daemons:2 ] 
> Clone Set: cl_sysadmin_notify [p_sysadmin_notify] 
> Started: [ node2 node1 ] 
> Stopped: [ p_sysadmin_notify:2 ] 
> stonith-node1(stonith:external/tripplitepdu):Started node2 
> stonith-node2(stonith:external/tripplitepdu):Started node1 
> Clone Set: cl_ping [p_ping] 
> Started: [ node2 node1 ] 
> Stopped: [ p_ping:2 ] 
> 
> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \ 
> attributes standby="off" 
> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \ 
> attributes standby="off" 
> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \ 
> attributes standby="on" 
> primitive p_drbd_mount2 ocf:linbit:drbd \ 
> params drbd_resource="mount2" \ 
> op monitor interval="15" role="Master" \ 
> op monitor interval="30" role="Slave" 
> primitive p_drbd_mount1 ocf:linbit:drbd \ 
> params drbd_resource="mount1" \ 
> op monitor interval="15" role="Master" \ 
> op monitor interval="30" role="Slave" 
> primitive p_drbd_vmstore ocf:linbit:drbd \ 
> params drbd_resource="vmstore" \ 
> op monitor interval="15" role="Master" \ 
> op monitor interval="30" role="Slave" 
> primitive p_fs_vmstore ocf:heartbeat:Filesystem \ 
> params device="/dev/drbd0" directory="/vmstore" fstype="ext4" \ 
> op start interval="0" timeout="60s" \ 
> op stop interval="0" timeout="60s" \ 
> op monitor interval="20s" timeout="40s" 
> primitive p_libvirt-bin upstart:libvirt-bin \ 
> op monitor interval="30" 
> primitive p_ping ocf:pacemaker:ping \ 
> params name="p_ping" host_list="192.168.1.10 192.168.1.11" 
> multiplier="1000" \ 
> op monitor interval="20s" 
> primitive p_sysadmin_notify ocf:heartbeat:MailTo \ 
> params email="me at example.com" \ 
> params subject="Pacemaker Change" \ 
> op start interval="0" timeout="10" \ 
> op stop interval="0" timeout="10" \ 
> op monitor interval="10" timeout="10" 
> primitive p_vm ocf:heartbeat:VirtualDomain \ 
> params config="/vmstore/config/vm.xml" \ 
> meta allow-migrate="false" \ 
> op start interval="0" timeout="120s" \ 
> op stop interval="0" timeout="120s" \ 
> op monitor interval="10" timeout="30" 
> primitive stonith-node1 stonith:external/tripplitepdu \ 
> params pdu_ipaddr="192.168.1.12" pdu_port="1" pdu_username="xxx" 
> pdu_password="xxx" hostname_to_stonith="node1" 
> primitive stonith-node2 stonith:external/tripplitepdu \ 
> params pdu_ipaddr="192.168.1.12" pdu_port="2" pdu_username="xxx" 
> pdu_password="xxx" hostname_to_stonith="node2" 
> group g_daemons p_libvirt-bin 
> group g_vm p_fs_vmstore p_vm 
> ms ms_drbd_mount2 p_drbd_mount2 \ 
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" 
> ms ms_drbd_mount1 p_drbd_mount1 \ 
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" 
> ms ms_drbd_vmstore p_drbd_vmstore \ 
> meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" 
> clone cl_daemons g_daemons 
> clone cl_ping p_ping \ 
> meta interleave="true" 
> clone cl_sysadmin_notify p_sysadmin_notify 
> location l-st-node1 stonith-node1 -inf: node1 
> location l-st-node2 stonith-node2 -inf: node2 
> location l_run_on_most_connected p_vm \ 
> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping 
> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master 
> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm 

As Emmanuel already said, g_vm has to be in the first place in this 
collocation constraint .... g_vm must be colocated with the drbd masters. 

> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote 
> ms_drbd_mount2:promote cl_daemons:start g_vm:start 
> property $id="cib-bootstrap-options" \ 
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ 
> cluster-infrastructure="Heartbeat" \ 
> stonith-enabled="false" \ 
> no-quorum-policy="stop" \ 
> last-lrm-refresh="1332539900" \ 
> cluster-recheck-interval="5m" \ 
> crmd-integration-timeout="3m" \ 
> shutdown-escalation="5m" 
> 
> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite 
> PDUMH20ATNET that I'm using as the STONITH device: 
> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf 

And why don't using it? .... stonith-enabled="false" 

> 
> As you can see, I left the DRBD service to be started by the operating 
> system (as an lsb script at boot time) however Pacemaker controls 
> actually bringing up/taking down the individual DRBD devices. 

Don't start drbd on system boot, give Pacemaker the full control. 

The 
> behavior I observe is as follows: I issue "crm resource migrate p_vm" on 
> node1 and failover successfully to node2. During this time, node2 fences 
> node1's DRBD devices (using dopd) and marks them as Outdated. Meanwhile 
> node2's DRBD devices are UpToDate. I then shutdown both nodes and then 
> bring them back up. They reconnect to the cluster (with quorum), and 
> node1's DRBD devices are still Outdated as expected and node2's DRBD 
> devices are still UpToDate, as expected. At this point, DRBD starts on 
> both nodes, however node2 will not set DRBD as master: 
> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby) 
> Online: [ node2 node1 ] 
> 
> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] 
> Slaves: [ node1 node2 ] 
> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] 
> Slaves: [ node1 node 2 ] 
> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
> Slaves: [ node1 node2 ] 

There should really be no interruption of the drbd replication on vm 
migration that activates the dopd ... drbd has its own direct network 
connection? 

Please share your ha.cf file and your drbd configuration. Watch out for 
drbd messages in your kernel log file, that should give you additional 
information when/why the drbd connection was lost. 

Regards, 
Andreas 

-- 
Need help with Pacemaker? 
http://www.hastexo.com/now 

> 
> I am having trouble sorting through the logging information because 
> there is so much of it in /var/log/daemon.log, but I can't find an 
> error message printed about why it will not promote node2. At this point 
> the DRBD devices are as follows: 
> node2: cstate = WFConnection dstate=UpToDate 
> node1: cstate = StandAlone dstate=Outdated 
> 
> I don't see any reason why node2 can't become DRBD master, or am I 
> missing something? If I do "drbdadm connect all" on node1, then the 
> cstate on both nodes changes to "Connected" and node2 immediately 
> promotes the DRBD resources to master. Any ideas on why I'm observing 
> this incorrect behavior? 
> 
> Any tips on how I can better filter through the pacemaker/heartbeat logs 
> or how to get additional useful debug information? 
> 
> Thanks, 
> 
> Andrew 
> 
> ------------------------------------------------------------------------ 
> *From: *"Andreas Kurz" <andreas at hastexo.com> 
> *To: *pacemaker at oss.clusterlabs.org 
> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM 
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to 
> master on failover 
> 
> On 01/25/2012 08:58 PM, Andrew Martin wrote: 
>> Hello, 
>> 
>> Recently I finished configuring a two-node cluster with pacemaker 1.1.6 
>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster includes 
>> the following resources: 
>> - primitives for DRBD storage devices 
>> - primitives for mounting the filesystem on the DRBD storage 
>> - primitives for some mount binds 
>> - primitive for starting apache 
>> - primitives for starting samba and nfs servers (following instructions 
>> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>) 
>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs) 
> 
> not enough information ... please share at least your complete cluster 
> configuration 
> 
> Regards, 
> Andreas 
> 
> -- 
> Need help with Pacemaker? 
> http://www.hastexo.com/now 
> 
>> 
>> Perhaps this is best described through the output of crm_mon: 
>> Online: [ node1 node2 ] 
>> 
>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged) 
>> p_drbd_mount1:0 (ocf::linbit:drbd): Started node2 (unmanaged) 
>> p_drbd_mount1:1 (ocf::linbit:drbd): Started node1 
>> (unmanaged) FAILED 
>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
>> p_drbd_mount2:0 (ocf::linbit:drbd): Master node1 
>> (unmanaged) FAILED 
>> Slaves: [ node2 ] 
>> Resource Group: g_core 
>> p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_mount2 (ocf::heartbeat:Filesystem): Started node1 
>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1 
>> Resource Group: g_apache 
>> p_fs_mountbind1 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_mountbind2 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_mountbind3 (ocf::heartbeat:Filesystem): Started node1 
>> p_fs_varwww (ocf::heartbeat:Filesystem): Started node1 
>> p_apache (ocf::heartbeat:apache): Started node1 
>> Resource Group: g_fileservers 
>> p_lsb_smb (lsb:smbd): Started node1 
>> p_lsb_nmb (lsb:nmbd): Started node1 
>> p_lsb_nfsserver (lsb:nfs-kernel-server): Started node1 
>> p_exportfs_mount1 (ocf::heartbeat:exportfs): Started node1 
>> p_exportfs_mount2 (ocf::heartbeat:exportfs): Started node1 
>> 
>> I have read through the Pacemaker Explained 
>> 
> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained> 
>> documentation, however could not find a way to further debug these 
>> problems. First, I put node1 into standby mode to attempt failover to 
>> the other node (node2). Node2 appeared to start the transition to 
>> master, however it failed to promote the DRBD resources to master (the 
>> first step). I have attached a copy of this session in commands.log and 
>> additional excerpts from /var/log/syslog during important steps. I have 
>> attempted everything I can think of to try and start the DRBD resource 
>> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting 
>> heartbeat) but cannot bring it out of the slave state. However, if I set 
>> it to unmanaged and then run drbdadm primary all in the terminal, 
>> pacemaker is satisfied and continues starting the rest of the resources. 
>> It then failed when attempting to mount the filesystem for mount2, the 
>> p_fs_mount2 resource. I attempted to mount the filesystem myself and was 
>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then 
>> it mounted. The rest of the resources started as expected until the 
>> p_exportfs_mount2 resource, which failed as follows: 
>> p_exportfs_mount2 (ocf::heartbeat:exportfs): started node2 
>> (unmanaged) FAILED 
>> 
>> I ran cleanup on this and it started, however when running this test 
>> earlier today no command could successfully start this exportfs resource. 
>> 
>> How can I configure pacemaker to better resolve these problems and be 
>> able to bring the node up successfully on its own? What can I check to 
>> determine why these failures are occuring? /var/log/syslog did not seem 
>> to contain very much useful information regarding why the failures 
> occurred. 
>> 
>> Thanks, 
>> 
>> Andrew 
>> 
>> 
>> 
>> 
>> This body part will be downloaded on demand. 
> 
> 
> 
> 
> 
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 



_______________________________________________ 
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120327/add7339c/attachment-0003.html>


More information about the Pacemaker mailing list