[Pacemaker] Nodes will not promote DRBD resources to master on failover

Wed Mar 28 14:56:51 UTC 2012

Hi Andreas, 

I disabled the DRBD init script and then restarted the slave node (node2). After it came back up, DRBD did not start: 

Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending 
Online: [ node2 node1 ] 

Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] 
Masters: [ node1 ] 
Stopped: [ p_drbd_vmstore:1 ] 
Master/Slave Set: ms_drbd_mount1 [p_drbd_tools] 
Masters: [ node1 ] 
Stopped: [ p_drbd_mount1:1 ] 
Master/Slave Set: ms_drbd_mount2 [p_drbdmount2] 
Masters: [ node1 ] 
Stopped: [ p_drbd_mount2:1 ] 
... 

root at node2:~# service drbd status 
drbd not loaded 

Is there something else I need to change in the CIB to ensure that DRBD is started? All of my DRBD devices are configured like this: 

primitive p_drbd_mount2 ocf:linbit:drbd \ 
params drbd_resource="mount2" \ 
op monitor interval="15" role="Master" \ 
op monitor interval="30" role="Slave" 

ms ms_drbd_mount2 p_drbd_mount2 \ 
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" 

Here is the output from the syslog (grep -i drbd /var/log/syslog): 

Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc op=p_drbd_vmstore:1_monitor_0 ) 
Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2] (pid 3455) 
Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc op=p_drbd_mount1:1_monitor_0 ) 
Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3] (pid 3456) 
Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc op=p_drbd_mount2:1_monitor_0 ) 
Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4] (pid 3457) 
Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find device [/dev/drbd0]. Expected /dev/??? to exist 
Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked: crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D 
Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked: crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D 
Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked: crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D 
Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7 
Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7 
Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10, confirmed=true) not running 
Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7 
Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11, confirmed=true) not running 
Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12, confirmed=true) not running 

Thanks, 

Andrew 
----- Original Message -----

From: "Andreas Kurz" <andreas at hastexo.com> 
To: pacemaker at oss.clusterlabs.org 
Sent: Wednesday, March 28, 2012 9:03:06 AM 
Subject: Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover 

On 03/28/2012 03:47 PM, Andrew Martin wrote: 
> Hi Andreas, 
> 
>> hmm ... what is that fence-peer script doing? If you want to use 
>> resource-level fencing with the help of dopd, activate the 
>> drbd-peer-outdater script in the line above ... and double check if the 
>> path is correct 
> fence-peer is just a wrapper for drbd-peer-outdater that does some 
> additional logging. In my testing dopd has been working well. 

I see 

> 
>>> I am thinking of making the following changes to the CIB (as per the 
>>> official DRBD 
>>> guide 
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in 
>>> order to add the DRBD lsb service and require that it start before the 
>>> ocf:linbit:drbd resources. Does this look correct? 
>> 
>> Where did you read that? No, deactivate the startup of DRBD on system 
>> boot and let Pacemaker manage it completely. 
>> 
>>> primitive p_drbd-init lsb:drbd op monitor interval="30" 
>>> colocation c_drbd_together inf: 
>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master 
>>> ms_drbd_mount2:Master 
>>> order drbd_init_first inf: ms_drbd_vmstore:promote 
>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start 
>>> 
>>> This doesn't seem to require that drbd be also running on the node where 
>>> the ocf:linbit:drbd resources are slave (which it would need to do to be 
>>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere? 
>>> (clone cl_drbd p_drbd-init ?) 
>> 
>> This is really not needed. 
> I was following the official DRBD Users Guide: 
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html 
> 
> If I am understanding your previous message correctly, I do not need to 
> add a lsb primitive for the drbd daemon? It will be 
> started/stopped/managed automatically by my ocf:linbit:drbd resources 
> (and I can remove the /etc/rc* symlinks)? 

Yes, you don't need that LSB script when using Pacemaker and should not 
let init start it. 

Regards, 
Andreas 

-- 
Need help with Pacemaker? 
http://www.hastexo.com/now 

> 
> Thanks, 
> 
> Andrew 
> 
> ------------------------------------------------------------------------ 
> *From: *"Andreas Kurz" <andreas at hastexo.com <mailto:andreas at hastexo.com>> 
> *To: *pacemaker at oss.clusterlabs.org <mailto:pacemaker at oss.clusterlabs.org> 
> *Sent: *Wednesday, March 28, 2012 7:27:34 AM 
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to 
> master on failover 
> 
> On 03/28/2012 12:13 AM, Andrew Martin wrote: 
>> Hi Andreas, 
>> 
>> Thanks, I've updated the colocation rule to be in the correct order. I 
>> also enabled the STONITH resource (this was temporarily disabled before 
>> for some additional testing). DRBD has its own network connection over 
>> the br1 interface (192.168.5.0/24 network), a direct crossover cable 
>> between node1 and node2: 
>> global { usage-count no; } 
>> common { 
>> syncer { rate 110M; } 
>> } 
>> resource vmstore { 
>> protocol C; 
>> startup { 
>> wfc-timeout 15; 
>> degr-wfc-timeout 60; 
>> } 
>> handlers { 
>> #fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; 
>> fence-peer "/usr/local/bin/fence-peer"; 
> 
> hmm ... what is that fence-peer script doing? If you want to use 
> resource-level fencing with the help of dopd, activate the 
> drbd-peer-outdater script in the line above ... and double check if the 
> path is correct 
> 
>> split-brain "/usr/lib/drbd/notify-split-brain.sh 
>> me at example.com <mailto:me at example.com>"; 
>> } 
>> net { 
>> after-sb-0pri discard-zero-changes; 
>> after-sb-1pri discard-secondary; 
>> after-sb-2pri disconnect; 
>> cram-hmac-alg md5; 
>> shared-secret "xxxxx"; 
>> } 
>> disk { 
>> fencing resource-only; 
>> } 
>> on node1 { 
>> device /dev/drbd0; 
>> disk /dev/sdb1; 
>> address 192.168.5.10:7787; 
>> meta-disk internal; 
>> } 
>> on node2 { 
>> device /dev/drbd0; 
>> disk /dev/sdf1; 
>> address 192.168.5.11:7787; 
>> meta-disk internal; 
>> } 
>> } 
>> # and similar for mount1 and mount2 
>> 
>> Also, here is my ha.cf. It uses both the direct link between the nodes 
>> (br1) and the shared LAN network on br0 for communicating: 
>> autojoin none 
>> mcast br0 239.0.0.43 694 1 0 
>> bcast br1 
>> warntime 5 
>> deadtime 15 
>> initdead 60 
>> keepalive 2 
>> node node1 
>> node node2 
>> node quorumnode 
>> crm respawn 
>> respawn hacluster /usr/lib/heartbeat/dopd 
>> apiauth dopd gid=haclient uid=hacluster 
>> 
>> I am thinking of making the following changes to the CIB (as per the 
>> official DRBD 
>> guide 
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in 
>> order to add the DRBD lsb service and require that it start before the 
>> ocf:linbit:drbd resources. Does this look correct? 
> 
> Where did you read that? No, deactivate the startup of DRBD on system 
> boot and let Pacemaker manage it completely. 
> 
>> primitive p_drbd-init lsb:drbd op monitor interval="30" 
>> colocation c_drbd_together inf: 
>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master 
>> ms_drbd_mount2:Master 
>> order drbd_init_first inf: ms_drbd_vmstore:promote 
>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start 
>> 
>> This doesn't seem to require that drbd be also running on the node where 
>> the ocf:linbit:drbd resources are slave (which it would need to do to be 
>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere? 
>> (clone cl_drbd p_drbd-init ?) 
> 
> This is really not needed. 
> 
> Regards, 
> Andreas 
> 
> -- 
> Need help with Pacemaker? 
> http://www.hastexo.com/now 
> 
>> 
>> Thanks, 
>> 
>> Andrew 
>> ------------------------------------------------------------------------ 
>> *From: *"Andreas Kurz" <andreas at hastexo.com <mailto:andreas at hastexo.com>> 
>> *To: *pacemaker at oss.clusterlabs.org <mailto:*pacemaker at oss.clusterlabs.org> 
>> *Sent: *Monday, March 26, 2012 5:56:22 PM 
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to 
>> master on failover 
>> 
>> On 03/24/2012 08:15 PM, Andrew Martin wrote: 
>>> Hi Andreas, 
>>> 
>>> My complete cluster configuration is as follows: 
>>> ============ 
>>> Last updated: Sat Mar 24 13:51:55 2012 
>>> Last change: Sat Mar 24 13:41:55 2012 
>>> Stack: Heartbeat 
>>> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition 
>>> with quorum 
>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 
>>> 3 Nodes configured, unknown expected votes 
>>> 19 Resources configured. 
>>> ============ 
>>> 
>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby) 
>>> Online: [ node2 node1 ] 
>>> 
>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] 
>>> Masters: [ node2 ] 
>>> Slaves: [ node1 ] 
>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] 
>>> Masters: [ node2 ] 
>>> Slaves: [ node1 ] 
>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
>>> Masters: [ node2 ] 
>>> Slaves: [ node1 ] 
>>> Resource Group: g_vm 
>>> p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2 
>>> p_vm(ocf::heartbeat:VirtualDomain):Started node2 
>>> Clone Set: cl_daemons [g_daemons] 
>>> Started: [ node2 node1 ] 
>>> Stopped: [ g_daemons:2 ] 
>>> Clone Set: cl_sysadmin_notify [p_sysadmin_notify] 
>>> Started: [ node2 node1 ] 
>>> Stopped: [ p_sysadmin_notify:2 ] 
>>> stonith-node1(stonith:external/tripplitepdu):Started node2 
>>> stonith-node2(stonith:external/tripplitepdu):Started node1 
>>> Clone Set: cl_ping [p_ping] 
>>> Started: [ node2 node1 ] 
>>> Stopped: [ p_ping:2 ] 
>>> 
>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \ 
>>> attributes standby="off" 
>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \ 
>>> attributes standby="off" 
>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \ 
>>> attributes standby="on" 
>>> primitive p_drbd_mount2 ocf:linbit:drbd \ 
>>> params drbd_resource="mount2" \ 
>>> op monitor interval="15" role="Master" \ 
>>> op monitor interval="30" role="Slave" 
>>> primitive p_drbd_mount1 ocf:linbit:drbd \ 
>>> params drbd_resource="mount1" \ 
>>> op monitor interval="15" role="Master" \ 
>>> op monitor interval="30" role="Slave" 
>>> primitive p_drbd_vmstore ocf:linbit:drbd \ 
>>> params drbd_resource="vmstore" \ 
>>> op monitor interval="15" role="Master" \ 
>>> op monitor interval="30" role="Slave" 
>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \ 
>>> params device="/dev/drbd0" directory="/vmstore" fstype="ext4" \ 
>>> op start interval="0" timeout="60s" \ 
>>> op stop interval="0" timeout="60s" \ 
>>> op monitor interval="20s" timeout="40s" 
>>> primitive p_libvirt-bin upstart:libvirt-bin \ 
>>> op monitor interval="30" 
>>> primitive p_ping ocf:pacemaker:ping \ 
>>> params name="p_ping" host_list="192.168.1.10 192.168.1.11" 
>>> multiplier="1000" \ 
>>> op monitor interval="20s" 
>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \ 
>>> params email="me at example.com <mailto:me at example.com>" \ 
>>> params subject="Pacemaker Change" \ 
>>> op start interval="0" timeout="10" \ 
>>> op stop interval="0" timeout="10" \ 
>>> op monitor interval="10" timeout="10" 
>>> primitive p_vm ocf:heartbeat:VirtualDomain \ 
>>> params config="/vmstore/config/vm.xml" \ 
>>> meta allow-migrate="false" \ 
>>> op start interval="0" timeout="120s" \ 
>>> op stop interval="0" timeout="120s" \ 
>>> op monitor interval="10" timeout="30" 
>>> primitive stonith-node1 stonith:external/tripplitepdu \ 
>>> params pdu_ipaddr="192.168.1.12" pdu_port="1" pdu_username="xxx" 
>>> pdu_password="xxx" hostname_to_stonith="node1" 
>>> primitive stonith-node2 stonith:external/tripplitepdu \ 
>>> params pdu_ipaddr="192.168.1.12" pdu_port="2" pdu_username="xxx" 
>>> pdu_password="xxx" hostname_to_stonith="node2" 
>>> group g_daemons p_libvirt-bin 
>>> group g_vm p_fs_vmstore p_vm 
>>> ms ms_drbd_mount2 p_drbd_mount2 \ 
>>> meta master-max="1" master-node-max="1" clone-max="2" 
>>> clone-node-max="1" notify="true" 
>>> ms ms_drbd_mount1 p_drbd_mount1 \ 
>>> meta master-max="1" master-node-max="1" clone-max="2" 
>>> clone-node-max="1" notify="true" 
>>> ms ms_drbd_vmstore p_drbd_vmstore \ 
>>> meta master-max="1" master-node-max="1" clone-max="2" 
>>> clone-node-max="1" notify="true" 
>>> clone cl_daemons g_daemons 
>>> clone cl_ping p_ping \ 
>>> meta interleave="true" 
>>> clone cl_sysadmin_notify p_sysadmin_notify 
>>> location l-st-node1 stonith-node1 -inf: node1 
>>> location l-st-node2 stonith-node2 -inf: node2 
>>> location l_run_on_most_connected p_vm \ 
>>> rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping 
>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master 
>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm 
>> 
>> As Emmanuel already said, g_vm has to be in the first place in this 
>> collocation constraint .... g_vm must be colocated with the drbd masters. 
>> 
>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote 
>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start 
>>> property $id="cib-bootstrap-options" \ 
>>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ 
>>> cluster-infrastructure="Heartbeat" \ 
>>> stonith-enabled="false" \ 
>>> no-quorum-policy="stop" \ 
>>> last-lrm-refresh="1332539900" \ 
>>> cluster-recheck-interval="5m" \ 
>>> crmd-integration-timeout="3m" \ 
>>> shutdown-escalation="5m" 
>>> 
>>> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite 
>>> PDUMH20ATNET that I'm using as the STONITH device: 
>>> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf 
>> 
>> And why don't using it? .... stonith-enabled="false" 
>> 
>>> 
>>> As you can see, I left the DRBD service to be started by the operating 
>>> system (as an lsb script at boot time) however Pacemaker controls 
>>> actually bringing up/taking down the individual DRBD devices. 
>> 
>> Don't start drbd on system boot, give Pacemaker the full control. 
>> 
>> The 
>>> behavior I observe is as follows: I issue "crm resource migrate p_vm" on 
>>> node1 and failover successfully to node2. During this time, node2 fences 
>>> node1's DRBD devices (using dopd) and marks them as Outdated. Meanwhile 
>>> node2's DRBD devices are UpToDate. I then shutdown both nodes and then 
>>> bring them back up. They reconnect to the cluster (with quorum), and 
>>> node1's DRBD devices are still Outdated as expected and node2's DRBD 
>>> devices are still UpToDate, as expected. At this point, DRBD starts on 
>>> both nodes, however node2 will not set DRBD as master: 
>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby) 
>>> Online: [ node2 node1 ] 
>>> 
>>> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] 
>>> Slaves: [ node1 node2 ] 
>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] 
>>> Slaves: [ node1 node 2 ] 
>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
>>> Slaves: [ node1 node2 ] 
>> 
>> There should really be no interruption of the drbd replication on vm 
>> migration that activates the dopd ... drbd has its own direct network 
>> connection? 
>> 
>> Please share your ha.cf file and your drbd configuration. Watch out for 
>> drbd messages in your kernel log file, that should give you additional 
>> information when/why the drbd connection was lost. 
>> 
>> Regards, 
>> Andreas 
>> 
>> -- 
>> Need help with Pacemaker? 
>> http://www.hastexo.com/now 
>> 
>>> 
>>> I am having trouble sorting through the logging information because 
>>> there is so much of it in /var/log/daemon.log, but I can't find an 
>>> error message printed about why it will not promote node2. At this point 
>>> the DRBD devices are as follows: 
>>> node2: cstate = WFConnection dstate=UpToDate 
>>> node1: cstate = StandAlone dstate=Outdated 
>>> 
>>> I don't see any reason why node2 can't become DRBD master, or am I 
>>> missing something? If I do "drbdadm connect all" on node1, then the 
>>> cstate on both nodes changes to "Connected" and node2 immediately 
>>> promotes the DRBD resources to master. Any ideas on why I'm observing 
>>> this incorrect behavior? 
>>> 
>>> Any tips on how I can better filter through the pacemaker/heartbeat logs 
>>> or how to get additional useful debug information? 
>>> 
>>> Thanks, 
>>> 
>>> Andrew 
>>> 
>>> ------------------------------------------------------------------------ 
>>> *From: *"Andreas Kurz" <andreas at hastexo.com <mailto:andreas at hastexo.com>> 
>>> *To: *pacemaker at oss.clusterlabs.org 
> <mailto:*pacemaker at oss.clusterlabs.org> 
>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM 
>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to 
>>> master on failover 
>>> 
>>> On 01/25/2012 08:58 PM, Andrew Martin wrote: 
>>>> Hello, 
>>>> 
>>>> Recently I finished configuring a two-node cluster with pacemaker 1.1.6 
>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster includes 
>>>> the following resources: 
>>>> - primitives for DRBD storage devices 
>>>> - primitives for mounting the filesystem on the DRBD storage 
>>>> - primitives for some mount binds 
>>>> - primitive for starting apache 
>>>> - primitives for starting samba and nfs servers (following instructions 
>>>> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>) 
>>>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs) 
>>> 
>>> not enough information ... please share at least your complete cluster 
>>> configuration 
>>> 
>>> Regards, 
>>> Andreas 
>>> 
>>> -- 
>>> Need help with Pacemaker? 
>>> http://www.hastexo.com/now 
>>> 
>>>> 
>>>> Perhaps this is best described through the output of crm_mon: 
>>>> Online: [ node1 node2 ] 
>>>> 
>>>> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged) 
>>>> p_drbd_mount1:0 (ocf::linbit:drbd): Started node2 
>> (unmanaged) 
>>>> p_drbd_mount1:1 (ocf::linbit:drbd): Started node1 
>>>> (unmanaged) FAILED 
>>>> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
>>>> p_drbd_mount2:0 (ocf::linbit:drbd): Master node1 
>>>> (unmanaged) FAILED 
>>>> Slaves: [ node2 ] 
>>>> Resource Group: g_core 
>>>> p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1 
>>>> p_fs_mount2 (ocf::heartbeat:Filesystem): Started node1 
>>>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1 
>>>> Resource Group: g_apache 
>>>> p_fs_mountbind1 (ocf::heartbeat:Filesystem): Started node1 
>>>> p_fs_mountbind2 (ocf::heartbeat:Filesystem): Started node1 
>>>> p_fs_mountbind3 (ocf::heartbeat:Filesystem): Started node1 
>>>> p_fs_varwww (ocf::heartbeat:Filesystem): Started node1 
>>>> p_apache (ocf::heartbeat:apache): Started node1 
>>>> Resource Group: g_fileservers 
>>>> p_lsb_smb (lsb:smbd): Started node1 
>>>> p_lsb_nmb (lsb:nmbd): Started node1 
>>>> p_lsb_nfsserver (lsb:nfs-kernel-server): Started node1 
>>>> p_exportfs_mount1 (ocf::heartbeat:exportfs): Started node1 
>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): Started node1 
>>>> 
>>>> I have read through the Pacemaker Explained 
>>>> 
>>> 
>> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained> 
>>>> documentation, however could not find a way to further debug these 
>>>> problems. First, I put node1 into standby mode to attempt failover to 
>>>> the other node (node2). Node2 appeared to start the transition to 
>>>> master, however it failed to promote the DRBD resources to master (the 
>>>> first step). I have attached a copy of this session in commands.log and 
>>>> additional excerpts from /var/log/syslog during important steps. I have 
>>>> attempted everything I can think of to try and start the DRBD resource 
>>>> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting 
>>>> heartbeat) but cannot bring it out of the slave state. However, if I set 
>>>> it to unmanaged and then run drbdadm primary all in the terminal, 
>>>> pacemaker is satisfied and continues starting the rest of the resources. 
>>>> It then failed when attempting to mount the filesystem for mount2, the 
>>>> p_fs_mount2 resource. I attempted to mount the filesystem myself and was 
>>>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then 
>>>> it mounted. The rest of the resources started as expected until the 
>>>> p_exportfs_mount2 resource, which failed as follows: 
>>>> p_exportfs_mount2 (ocf::heartbeat:exportfs): started node2 
>>>> (unmanaged) FAILED 
>>>> 
>>>> I ran cleanup on this and it started, however when running this test 
>>>> earlier today no command could successfully start this exportfs 
> resource. 
>>>> 
>>>> How can I configure pacemaker to better resolve these problems and be 
>>>> able to bring the node up successfully on its own? What can I check to 
>>>> determine why these failures are occuring? /var/log/syslog did not seem 
>>>> to contain very much useful information regarding why the failures 
>>> occurred. 
>>>> 
>>>> Thanks, 
>>>> 
>>>> Andrew 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> This body part will be downloaded on demand. 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________ 
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> <mailto:Pacemaker at oss.clusterlabs.org> 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
>>> 
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>> 
>>> 
>>> 
>>> _______________________________________________ 
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> <mailto:Pacemaker at oss.clusterlabs.org> 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
>>> 
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>> 
>> 
>> 
>> _______________________________________________ 
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> <mailto:Pacemaker at oss.clusterlabs.org> 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>> 
>> 
>> 
>> _______________________________________________ 
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> <mailto:Pacemaker at oss.clusterlabs.org> 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> <mailto:Pacemaker at oss.clusterlabs.org> 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> _______________________________________________ 
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 

_______________________________________________ 
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120328/e274770e/attachment.htm>