<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Times New Roman; font-size: 12pt; color: #000000'><font size="3">Hi Andreas,</font><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><br></div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">I disabled the DRBD init script and then restarted the slave node (node2). After it came back up, DRBD did not start:</div><div><div>Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending</div><div>Online: [ node2 node1 ]</div><div><br></div><div> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]</div><div>     Masters: [ node1 ]</div><div>     Stopped: [ p_drbd_vmstore:1 ]</div><div> Master/Slave Set: ms_drbd_mount1 [p_drbd_tools]</div><div>     Masters: [ node1 ]</div><div>     Stopped: [ p_drbd_mount1:1 ]</div><div> Master/Slave Set: ms_drbd_mount2 [p_drbdmount2]</div><div>     Masters: [ node1 ]</div><div>     Stopped: [ p_drbd_mount2:1 ]</div><div>...</div><div><br></div><div><div>root@node2:~# service drbd status</div><div>drbd not loaded</div></div><div><br></div><div>Is there something else I need to change in the CIB to ensure that DRBD is started? All of my DRBD devices are configured like this:</div><div><div>primitive p_drbd_mount2 ocf:linbit:drbd \</div><div>        params drbd_resource="mount2" \</div><div>        op monitor interval="15" role="Master" \</div><div>        op monitor interval="30" role="Slave"</div></div><div><div>ms ms_drbd_mount2 p_drbd_mount2 \</div><div>        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"</div></div><div><br></div><div>Here is the output from the syslog (grep -i drbd /var/log/syslog):<br><div>Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc op=p_drbd_vmstore:1_monitor_0 )</div><div>Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2] (pid 3455)</div><div>Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc op=p_drbd_mount1:1_monitor_0 )</div><div>Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3] (pid 3456)</div><div>Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc op=p_drbd_mount2:1_monitor_0 )</div><div>Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4] (pid 3457)</div><div>Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find device [/dev/drbd0]. Expected /dev/??? to exist</div><div>Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked: crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D</div><div>Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked: crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D</div><div>Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked: crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D</div><div>Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7</div><div>Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7</div><div>Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10, confirmed=true) not running</div><div>Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7</div><div>Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11, confirmed=true) not running</div><div>Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12, confirmed=true) not running</div></div><div><br></div><div>Thanks,</div><div><br></div><div>Andrew</div><br><hr id="zwchr" style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><div style="color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-size: 12pt; "><b>From: </b>"Andreas Kurz" <andreas@hastexo.com><br><b>To: </b>pacemaker@oss.clusterlabs.org<br><b>Sent: </b>Wednesday, March 28, 2012 9:03:06 AM<br><b>Subject: </b>Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover<br><br>On 03/28/2012 03:47 PM, Andrew Martin wrote:<br>> Hi Andreas,<br>> <br>>> hmm ... what is that fence-peer script doing? If you want to use<br>>> resource-level fencing with the help of dopd, activate the<br>>> drbd-peer-outdater script in the line above ... and double check if the<br>>> path is correct<br>> fence-peer is just a wrapper for drbd-peer-outdater that does some<br>> additional logging. In my testing dopd has been working well.<br><br>I see<br><br>> <br>>>> I am thinking of making the following changes to the CIB (as per the<br>>>> official DRBD<br>>>> guide<br>> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in<br>>>> order to add the DRBD lsb service and require that it start before the<br>>>> ocf:linbit:drbd resources. Does this look correct?<br>>><br>>> Where did you read that? No, deactivate the startup of DRBD on system<br>>> boot and let Pacemaker manage it completely.<br>>><br>>>> primitive p_drbd-init lsb:drbd op monitor interval="30"<br>>>> colocation c_drbd_together inf:<br>>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master<br>>>> ms_drbd_mount2:Master<br>>>> order drbd_init_first inf: ms_drbd_vmstore:promote<br>>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start<br>>>><br>>>> This doesn't seem to require that drbd be also running on the node where<br>>>> the ocf:linbit:drbd resources are slave (which it would need to do to be<br>>>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?<br>>>> (clone cl_drbd p_drbd-init ?)<br>>><br>>> This is really not needed.<br>> I was following the official DRBD Users Guide:<br>> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html<br>> <br>> If I am understanding your previous message correctly, I do not need to<br>> add a lsb primitive for the drbd daemon? It will be<br>> started/stopped/managed automatically by my ocf:linbit:drbd resources<br>> (and I can remove the /etc/rc* symlinks)?<br><br>Yes, you don't need that LSB script when using Pacemaker and should not<br>let init start it.<br><br>Regards,<br>Andreas<br><br>-- <br>Need help with Pacemaker?<br>http://www.hastexo.com/now<br><br>> <br>> Thanks,<br>> <br>> Andrew<br>> <br>> ------------------------------------------------------------------------<br>> *From: *"Andreas Kurz" <andreas@hastexo.com <mailto:andreas@hastexo.com>><br>> *To: *pacemaker@oss.clusterlabs.org <mailto:pacemaker@oss.clusterlabs.org><br>> *Sent: *Wednesday, March 28, 2012 7:27:34 AM<br>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to<br>> master on failover<br>> <br>> On 03/28/2012 12:13 AM, Andrew Martin wrote:<br>>> Hi Andreas,<br>>><br>>> Thanks, I've updated the colocation rule to be in the correct order. I<br>>> also enabled the STONITH resource (this was temporarily disabled before<br>>> for some additional testing). DRBD has its own network connection over<br>>> the br1 interface (192.168.5.0/24 network), a direct crossover cable<br>>> between node1 and node2:<br>>> global { usage-count no; }<br>>> common {<br>>>         syncer { rate 110M; }<br>>> }<br>>> resource vmstore {<br>>>         protocol C;<br>>>         startup {<br>>>                 wfc-timeout  15;<br>>>                 degr-wfc-timeout 60;<br>>>         }<br>>>         handlers {<br>>>                 #fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";<br>>>                 fence-peer "/usr/local/bin/fence-peer";<br>> <br>> hmm ... what is that fence-peer script doing? If you want to use<br>> resource-level fencing with the help of dopd, activate the<br>> drbd-peer-outdater script in the line above ... and double check if the<br>> path is correct<br>> <br>>>                 split-brain "/usr/lib/drbd/notify-split-brain.sh<br>>> me@example.com <mailto:me@example.com>";<br>>>         }<br>>>         net {<br>>>                 after-sb-0pri discard-zero-changes;<br>>>                 after-sb-1pri discard-secondary;<br>>>                 after-sb-2pri disconnect;<br>>>                 cram-hmac-alg md5;<br>>>                 shared-secret "xxxxx";<br>>>         }<br>>>         disk {<br>>>                 fencing resource-only;<br>>>         }<br>>>         on node1 {<br>>>                 device /dev/drbd0;<br>>>                 disk /dev/sdb1;<br>>>                 address 192.168.5.10:7787;<br>>>                 meta-disk internal;<br>>>         }<br>>>         on node2 {<br>>>                 device /dev/drbd0;<br>>>                 disk /dev/sdf1;<br>>>                 address 192.168.5.11:7787;<br>>>                 meta-disk internal;<br>>>         }<br>>> }<br>>> # and similar for mount1 and mount2<br>>><br>>> Also, here is my ha.cf. It uses both the direct link between the nodes<br>>> (br1) and the shared LAN network on br0 for communicating:<br>>> autojoin none<br>>> mcast br0 239.0.0.43 694 1 0<br>>> bcast br1<br>>> warntime 5<br>>> deadtime 15<br>>> initdead 60<br>>> keepalive 2<br>>> node node1<br>>> node node2<br>>> node quorumnode<br>>> crm respawn<br>>> respawn hacluster /usr/lib/heartbeat/dopd<br>>> apiauth dopd gid=haclient uid=hacluster<br>>><br>>> I am thinking of making the following changes to the CIB (as per the<br>>> official DRBD<br>>> guide<br>> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in<br>>> order to add the DRBD lsb service and require that it start before the<br>>> ocf:linbit:drbd resources. Does this look correct?<br>> <br>> Where did you read that? No, deactivate the startup of DRBD on system<br>> boot and let Pacemaker manage it completely.<br>> <br>>> primitive p_drbd-init lsb:drbd op monitor interval="30"<br>>> colocation c_drbd_together inf:<br>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master<br>>> ms_drbd_mount2:Master<br>>> order drbd_init_first inf: ms_drbd_vmstore:promote<br>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start<br>>><br>>> This doesn't seem to require that drbd be also running on the node where<br>>> the ocf:linbit:drbd resources are slave (which it would need to do to be<br>>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?<br>>> (clone cl_drbd p_drbd-init ?)<br>> <br>> This is really not needed.<br>> <br>> Regards,<br>> Andreas<br>> <br>> -- <br>> Need help with Pacemaker?<br>> http://www.hastexo.com/now<br>> <br>>><br>>> Thanks,<br>>><br>>> Andrew<br>>> ------------------------------------------------------------------------<br>>> *From: *"Andreas Kurz" <andreas@hastexo.com <mailto:andreas@hastexo.com>><br>>> *To: *pacemaker@oss.clusterlabs.org <mailto:*pacemaker@oss.clusterlabs.org><br>>> *Sent: *Monday, March 26, 2012 5:56:22 PM<br>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to<br>>> master on failover<br>>><br>>> On 03/24/2012 08:15 PM, Andrew Martin wrote:<br>>>> Hi Andreas,<br>>>><br>>>> My complete cluster configuration is as follows:<br>>>> ============<br>>>> Last updated: Sat Mar 24 13:51:55 2012<br>>>> Last change: Sat Mar 24 13:41:55 2012<br>>>> Stack: Heartbeat<br>>>> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition<br>>>> with quorum<br>>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c<br>>>> 3 Nodes configured, unknown expected votes<br>>>> 19 Resources configured.<br>>>> ============<br>>>><br>>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby)<br>>>> Online: [ node2 node1 ]<br>>>><br>>>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]<br>>>>      Masters: [ node2 ]<br>>>>      Slaves: [ node1 ]<br>>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]<br>>>>      Masters: [ node2 ]<br>>>>      Slaves: [ node1 ]<br>>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]<br>>>>      Masters: [ node2 ]<br>>>>      Slaves: [ node1 ]<br>>>>  Resource Group: g_vm<br>>>>      p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2<br>>>>      p_vm(ocf::heartbeat:VirtualDomain):Started node2<br>>>>  Clone Set: cl_daemons [g_daemons]<br>>>>      Started: [ node2 node1 ]<br>>>>      Stopped: [ g_daemons:2 ]<br>>>>  Clone Set: cl_sysadmin_notify [p_sysadmin_notify]<br>>>>      Started: [ node2 node1 ]<br>>>>      Stopped: [ p_sysadmin_notify:2 ]<br>>>>  stonith-node1(stonith:external/tripplitepdu):Started node2<br>>>>  stonith-node2(stonith:external/tripplitepdu):Started node1<br>>>>  Clone Set: cl_ping [p_ping]<br>>>>      Started: [ node2 node1 ]<br>>>>      Stopped: [ p_ping:2 ]<br>>>><br>>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \<br>>>>         attributes standby="off"<br>>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \<br>>>>         attributes standby="off"<br>>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \<br>>>>         attributes standby="on"<br>>>> primitive p_drbd_mount2 ocf:linbit:drbd \<br>>>>         params drbd_resource="mount2" \<br>>>>         op monitor interval="15" role="Master" \<br>>>>         op monitor interval="30" role="Slave"<br>>>> primitive p_drbd_mount1 ocf:linbit:drbd \<br>>>>         params drbd_resource="mount1" \<br>>>>         op monitor interval="15" role="Master" \<br>>>>         op monitor interval="30" role="Slave"<br>>>> primitive p_drbd_vmstore ocf:linbit:drbd \<br>>>>         params drbd_resource="vmstore" \<br>>>>         op monitor interval="15" role="Master" \<br>>>>         op monitor interval="30" role="Slave"<br>>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \<br>>>>         params device="/dev/drbd0" directory="/vmstore" fstype="ext4" \<br>>>>         op start interval="0" timeout="60s" \<br>>>>         op stop interval="0" timeout="60s" \<br>>>>         op monitor interval="20s" timeout="40s"<br>>>> primitive p_libvirt-bin upstart:libvirt-bin \<br>>>>         op monitor interval="30"<br>>>> primitive p_ping ocf:pacemaker:ping \<br>>>>         params name="p_ping" host_list="192.168.1.10 192.168.1.11"<br>>>> multiplier="1000" \<br>>>>         op monitor interval="20s"<br>>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \<br>>>>         params email="me@example.com <mailto:me@example.com>" \<br>>>>         params subject="Pacemaker Change" \<br>>>>         op start interval="0" timeout="10" \<br>>>>         op stop interval="0" timeout="10" \<br>>>>         op monitor interval="10" timeout="10"<br>>>> primitive p_vm ocf:heartbeat:VirtualDomain \<br>>>>         params config="/vmstore/config/vm.xml" \<br>>>>         meta allow-migrate="false" \<br>>>>         op start interval="0" timeout="120s" \<br>>>>         op stop interval="0" timeout="120s" \<br>>>>         op monitor interval="10" timeout="30"<br>>>> primitive stonith-node1 stonith:external/tripplitepdu \<br>>>>         params pdu_ipaddr="192.168.1.12" pdu_port="1" pdu_username="xxx"<br>>>> pdu_password="xxx" hostname_to_stonith="node1"<br>>>> primitive stonith-node2 stonith:external/tripplitepdu \<br>>>>         params pdu_ipaddr="192.168.1.12" pdu_port="2" pdu_username="xxx"<br>>>> pdu_password="xxx" hostname_to_stonith="node2"<br>>>> group g_daemons p_libvirt-bin<br>>>> group g_vm p_fs_vmstore p_vm<br>>>> ms ms_drbd_mount2 p_drbd_mount2 \<br>>>>         meta master-max="1" master-node-max="1" clone-max="2"<br>>>> clone-node-max="1" notify="true"<br>>>> ms ms_drbd_mount1 p_drbd_mount1 \<br>>>>         meta master-max="1" master-node-max="1" clone-max="2"<br>>>> clone-node-max="1" notify="true"<br>>>> ms ms_drbd_vmstore p_drbd_vmstore \<br>>>>         meta master-max="1" master-node-max="1" clone-max="2"<br>>>> clone-node-max="1" notify="true"<br>>>> clone cl_daemons g_daemons<br>>>> clone cl_ping p_ping \<br>>>>         meta interleave="true"<br>>>> clone cl_sysadmin_notify p_sysadmin_notify<br>>>> location l-st-node1 stonith-node1 -inf: node1<br>>>> location l-st-node2 stonith-node2 -inf: node2<br>>>> location l_run_on_most_connected p_vm \<br>>>>         rule $id="l_run_on_most_connected-rule" p_ping: defined p_ping<br>>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master<br>>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm<br>>><br>>> As Emmanuel already said, g_vm has to be in the first place in this<br>>> collocation constraint .... g_vm must be colocated with the drbd masters.<br>>><br>>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_mount1:promote<br>>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start<br>>>> property $id="cib-bootstrap-options" \<br>>>>         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \<br>>>>         cluster-infrastructure="Heartbeat" \<br>>>>         stonith-enabled="false" \<br>>>>         no-quorum-policy="stop" \<br>>>>         last-lrm-refresh="1332539900" \<br>>>>         cluster-recheck-interval="5m" \<br>>>>         crmd-integration-timeout="3m" \<br>>>>         shutdown-escalation="5m"<br>>>><br>>>> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite<br>>>> PDUMH20ATNET that I'm using as the STONITH device:<br>>>> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf<br>>><br>>> And why don't using it? .... stonith-enabled="false"<br>>><br>>>><br>>>> As you can see, I left the DRBD service to be started by the operating<br>>>> system (as an lsb script at boot time) however Pacemaker controls<br>>>> actually bringing up/taking down the individual DRBD devices.<br>>><br>>> Don't start drbd on system boot, give Pacemaker the full control.<br>>><br>>> The<br>>>> behavior I observe is as follows: I issue "crm resource migrate p_vm" on<br>>>> node1 and failover successfully to node2. During this time, node2 fences<br>>>> node1's DRBD devices (using dopd) and marks them as Outdated. Meanwhile<br>>>> node2's DRBD devices are UpToDate. I then shutdown both nodes and then<br>>>> bring them back up. They reconnect to the cluster (with quorum), and<br>>>> node1's DRBD devices are still Outdated as expected and node2's DRBD<br>>>> devices are still UpToDate, as expected. At this point, DRBD starts on<br>>>> both nodes, however node2 will not set DRBD as master:<br>>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE (standby)<br>>>> Online: [ node2 node1 ]<br>>>><br>>>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]<br>>>>      Slaves: [ node1 node2 ]<br>>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]<br>>>>      Slaves: [ node1 node 2 ]<br>>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]<br>>>>      Slaves: [ node1 node2 ]<br>>><br>>> There should really be no interruption of the drbd replication on vm<br>>> migration that activates the dopd ... drbd has its own direct network<br>>> connection?<br>>><br>>> Please share your ha.cf file and your drbd configuration. Watch out for<br>>> drbd messages in your kernel log file, that should give you additional<br>>> information when/why the drbd connection was lost.<br>>><br>>> Regards,<br>>> Andreas<br>>><br>>> --<br>>> Need help with Pacemaker?<br>>> http://www.hastexo.com/now<br>>><br>>>><br>>>> I am having trouble sorting through the logging information because<br>>>> there is so much of it in /var/log/daemon.log, but I can't  find an<br>>>> error message printed about why it will not promote node2. At this point<br>>>> the DRBD devices are as follows:<br>>>> node2: cstate = WFConnection dstate=UpToDate<br>>>> node1: cstate = StandAlone dstate=Outdated<br>>>><br>>>> I don't see any reason why node2 can't become DRBD master, or am I<br>>>> missing something? If I do "drbdadm connect all" on node1, then the<br>>>> cstate on both nodes changes to "Connected" and node2 immediately<br>>>> promotes the DRBD resources to master. Any ideas on why I'm observing<br>>>> this incorrect behavior?<br>>>><br>>>> Any tips on how I can better filter through the pacemaker/heartbeat logs<br>>>> or how to get additional useful debug information?<br>>>><br>>>> Thanks,<br>>>><br>>>> Andrew<br>>>><br>>>> ------------------------------------------------------------------------<br>>>> *From: *"Andreas Kurz" <andreas@hastexo.com <mailto:andreas@hastexo.com>><br>>>> *To: *pacemaker@oss.clusterlabs.org<br>> <mailto:*pacemaker@oss.clusterlabs.org><br>>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM<br>>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to<br>>>> master on failover<br>>>><br>>>> On 01/25/2012 08:58 PM, Andrew Martin wrote:<br>>>>> Hello,<br>>>>><br>>>>> Recently I finished configuring a two-node cluster with pacemaker 1.1.6<br>>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster includes<br>>>>> the following resources:<br>>>>> - primitives for DRBD storage devices<br>>>>> - primitives for mounting the filesystem on the DRBD storage<br>>>>> - primitives for some mount binds<br>>>>> - primitive for starting apache<br>>>>> - primitives for starting samba and nfs servers (following instructions<br>>>>> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>)<br>>>>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs)<br>>>><br>>>> not enough information ... please share at least your complete cluster<br>>>> configuration<br>>>><br>>>> Regards,<br>>>> Andreas<br>>>><br>>>> --<br>>>> Need help with Pacemaker?<br>>>> http://www.hastexo.com/now<br>>>><br>>>>><br>>>>> Perhaps this is best described through the output of crm_mon:<br>>>>> Online: [ node1 node2 ]<br>>>>><br>>>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged)<br>>>>>      p_drbd_mount1:0     (ocf::linbit:drbd):     Started node2<br>>> (unmanaged)<br>>>>>      p_drbd_mount1:1     (ocf::linbit:drbd):     Started node1<br>>>>> (unmanaged) FAILED<br>>>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]<br>>>>>      p_drbd_mount2:0       (ocf::linbit:drbd):     Master node1<br>>>>> (unmanaged) FAILED<br>>>>>      Slaves: [ node2 ]<br>>>>>  Resource Group: g_core<br>>>>>      p_fs_mount1 (ocf::heartbeat:Filesystem):    Started node1<br>>>>>      p_fs_mount2   (ocf::heartbeat:Filesystem):    Started node1<br>>>>>      p_ip_nfs   (ocf::heartbeat:IPaddr2):       Started node1<br>>>>>  Resource Group: g_apache<br>>>>>      p_fs_mountbind1    (ocf::heartbeat:Filesystem):    Started node1<br>>>>>      p_fs_mountbind2    (ocf::heartbeat:Filesystem):    Started node1<br>>>>>      p_fs_mountbind3    (ocf::heartbeat:Filesystem):    Started node1<br>>>>>      p_fs_varwww        (ocf::heartbeat:Filesystem):    Started node1<br>>>>>      p_apache   (ocf::heartbeat:apache):        Started node1<br>>>>>  Resource Group: g_fileservers<br>>>>>      p_lsb_smb  (lsb:smbd):     Started node1<br>>>>>      p_lsb_nmb  (lsb:nmbd):     Started node1<br>>>>>      p_lsb_nfsserver    (lsb:nfs-kernel-server):        Started node1<br>>>>>      p_exportfs_mount1   (ocf::heartbeat:exportfs):      Started node1<br>>>>>      p_exportfs_mount2     (ocf::heartbeat:exportfs):      Started node1<br>>>>><br>>>>> I have read through the Pacemaker Explained<br>>>>><br>>>><br>>> <http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained><br>>>>> documentation, however could not find a way to further debug these<br>>>>> problems. First, I put node1 into standby mode to attempt failover to<br>>>>> the other node (node2). Node2 appeared to start the transition to<br>>>>> master, however it failed to promote the DRBD resources to master (the<br>>>>> first step). I have attached a copy of this session in commands.log and<br>>>>> additional excerpts from /var/log/syslog during important steps. I have<br>>>>> attempted everything I can think of to try and start the DRBD resource<br>>>>> (e.g. start/stop/promote/manage/cleanup under crm resource, restarting<br>>>>> heartbeat) but cannot bring it out of the slave state. However, if I set<br>>>>> it to unmanaged and then run drbdadm primary all in the terminal,<br>>>>> pacemaker is satisfied and continues starting the rest of the resources.<br>>>>> It then failed when attempting to mount the filesystem for mount2, the<br>>>>> p_fs_mount2 resource. I attempted to mount the filesystem myself and was<br>>>>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and then<br>>>>> it mounted. The rest of the resources started as expected until the<br>>>>> p_exportfs_mount2 resource, which failed as follows:<br>>>>> p_exportfs_mount2     (ocf::heartbeat:exportfs):      started node2<br>>>>> (unmanaged) FAILED<br>>>>><br>>>>> I ran cleanup on this and it started, however when running this test<br>>>>> earlier today no command could successfully start this exportfs<br>> resource.<br>>>>><br>>>>> How can I configure pacemaker to better resolve these problems and be<br>>>>> able to bring the node up successfully on its own? What can I check to<br>>>>> determine why these failures are occuring? /var/log/syslog did not seem<br>>>>> to contain very much useful information regarding why the failures<br>>>> occurred.<br>>>>><br>>>>> Thanks,<br>>>>><br>>>>> Andrew<br>>>>><br>>>>><br>>>>><br>>>>><br>>>>> This body part will be downloaded on demand.<br>>>><br>>>><br>>>><br>>>><br>>>><br>>>> _______________________________________________<br>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> <mailto:Pacemaker@oss.clusterlabs.org><br>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>>>><br>>>> Project Home: http://www.clusterlabs.org<br>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>>>> Bugs: http://bugs.clusterlabs.org<br>>>><br>>>><br>>>><br>>>> _______________________________________________<br>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> <mailto:Pacemaker@oss.clusterlabs.org><br>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>>>><br>>>> Project Home: http://www.clusterlabs.org<br>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>>>> Bugs: http://bugs.clusterlabs.org<br>>><br>>><br>>><br>>> _______________________________________________<br>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> <mailto:Pacemaker@oss.clusterlabs.org><br>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>>><br>>> Project Home: http://www.clusterlabs.org<br>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>>> Bugs: http://bugs.clusterlabs.org<br>>><br>>><br>>><br>>> _______________________________________________<br>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> <mailto:Pacemaker@oss.clusterlabs.org><br>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>>><br>>> Project Home: http://www.clusterlabs.org<br>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>>> Bugs: http://bugs.clusterlabs.org<br>> <br>> <br>> <br>> _______________________________________________<br>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> <mailto:Pacemaker@oss.clusterlabs.org><br>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>> <br>> Project Home: http://www.clusterlabs.org<br>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>> Bugs: http://bugs.clusterlabs.org<br>> <br>> <br>> <br>> _______________________________________________<br>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br>> <br>> Project Home: http://www.clusterlabs.org<br>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>> Bugs: http://bugs.clusterlabs.org<br><br><br><br><br>_______________________________________________<br>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br><br>Project Home: http://www.clusterlabs.org<br>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>Bugs: http://bugs.clusterlabs.org<br></div><br></div></div></body></html>