[Pacemaker] Nodes will not promote DRBD resources to master on failover

emmanuel segura emi2fast at gmail.com
Fri Mar 30 11:26:48 EDT 2012


I think this constrain it's wrong
==================================================
colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
===================================================

change to
======================================================
colocation c_drbd_libvirt_vm inf: g_vm ms_drbd_vmstore:Master
ms_drbd_mount1:Master ms_drbd_mount2:Master
=======================================================

Il giorno 30 marzo 2012 17:16, Andrew Martin <amartin at xes-inc.com> ha
scritto:

> Hi Emmanuel,
>
> Here is the output of crm configure show:
> http://pastebin.com/NA1fZ8dL
>
> Thanks,
>
> Andrew
>
> ------------------------------
> *From: *"emmanuel segura" <emi2fast at gmail.com>
> *To: *"The Pacemaker cluster resource manager" <
> pacemaker at oss.clusterlabs.org>
> *Sent: *Friday, March 30, 2012 9:43:45 AM
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> master on        failover
>
> can you show me?
>
> crm configure show
>
> Il giorno 30 marzo 2012 16:10, Andrew Martin <amartin at xes-inc.com> ha
> scritto:
>
>> Hi Andreas,
>>
>> Here is a copy of my complete CIB:
>> http://pastebin.com/v5wHVFuy
>>
>> I'll work on generating a report using crm_report as well.
>>
>> Thanks,
>>
>> Andrew
>>
>> ------------------------------
>> *From: *"Andreas Kurz" <andreas at hastexo.com>
>> *To: *pacemaker at oss.clusterlabs.org
>> *Sent: *Friday, March 30, 2012 4:41:16 AM
>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> master on failover
>>
>> On 03/28/2012 04:56 PM, Andrew Martin wrote:
>> > Hi Andreas,
>> >
>> > I disabled the DRBD init script and then restarted the slave node
>> > (node2). After it came back up, DRBD did not start:
>> > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending
>> > Online: [ node2 node1 ]
>> >
>> >  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> >      Masters: [ node1 ]
>> >      Stopped: [ p_drbd_vmstore:1 ]
>> >  Master/Slave Set: ms_drbd_mount1 [p_drbd_tools]
>> >      Masters: [ node1 ]
>> >      Stopped: [ p_drbd_mount1:1 ]
>> >  Master/Slave Set: ms_drbd_mount2 [p_drbdmount2]
>> >      Masters: [ node1 ]
>> >      Stopped: [ p_drbd_mount2:1 ]
>> > ...
>> >
>> > root at node2:~# service drbd status
>> > drbd not loaded
>>
>> Yes, expected unless Pacemaker starts DRBD
>>
>> >
>> > Is there something else I need to change in the CIB to ensure that DRBD
>> > is started? All of my DRBD devices are configured like this:
>> > primitive p_drbd_mount2 ocf:linbit:drbd \
>> >         params drbd_resource="mount2" \
>> >         op monitor interval="15" role="Master" \
>> >         op monitor interval="30" role="Slave"
>> > ms ms_drbd_mount2 p_drbd_mount2 \
>> >         meta master-max="1" master-node-max="1" clone-max="2"
>> > clone-node-max="1" notify="true"
>>
>> That should be enough ... unable to say more without seeing the complete
>> configuration ... too much fragments of information ;-)
>>
>> Please provide (e.g. pastebin) your complete cib (cibadmin -Q) when
>> cluster is in that state ... or even better create a crm_report archive
>>
>> >
>> > Here is the output from the syslog (grep -i drbd /var/log/syslog):
>> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
>> > key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>> > op=p_drbd_vmstore:1_monitor_0 )
>> > Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2]
>> > (pid 3455)
>> > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
>> > key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>> > op=p_drbd_mount1:1_monitor_0 )
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3]
>> > (pid 3456)
>> > Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
>> > key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
>> > op=p_drbd_mount2:1_monitor_0 )
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4]
>> > (pid 3457)
>> > Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find
>> > device [/dev/drbd0]. Expected /dev/??? to exist
>> > Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked:
>> > crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D
>> > Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked:
>> > crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D
>> > Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked:
>> > crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on
>> > p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on
>> > p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7
>> > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
>> > operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10,
>> > confirmed=true) not running
>> > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on
>> > p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7
>> > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
>> > operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11,
>> > confirmed=true) not running
>> > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
>> > operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12,
>> > confirmed=true) not running
>>
>> No errors, just probing ... so for any reason Pacemaker does not like to
>> start it ... use crm_simulate to find out why ... or provide information
>> as requested above.
>>
>> Regards,
>> Andreas
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>> >
>> > Thanks,
>> >
>> > Andrew
>> >
>> > ------------------------------------------------------------------------
>> > *From: *"Andreas Kurz" <andreas at hastexo.com>
>> > *To: *pacemaker at oss.clusterlabs.org
>> > *Sent: *Wednesday, March 28, 2012 9:03:06 AM
>> > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> > master on failover
>> >
>> > On 03/28/2012 03:47 PM, Andrew Martin wrote:
>> >> Hi Andreas,
>> >>
>> >>> hmm ... what is that fence-peer script doing? If you want to use
>> >>> resource-level fencing with the help of dopd, activate the
>> >>> drbd-peer-outdater script in the line above ... and double check if
>> the
>> >>> path is correct
>> >> fence-peer is just a wrapper for drbd-peer-outdater that does some
>> >> additional logging. In my testing dopd has been working well.
>> >
>> > I see
>> >
>> >>
>> >>>> I am thinking of making the following changes to the CIB (as per the
>> >>>> official DRBD
>> >>>> guide
>> >>
>> >
>> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
>> in
>> >>>> order to add the DRBD lsb service and require that it start before
>> the
>> >>>> ocf:linbit:drbd resources. Does this look correct?
>> >>>
>> >>> Where did you read that? No, deactivate the startup of DRBD on system
>> >>> boot and let Pacemaker manage it completely.
>> >>>
>> >>>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>> >>>> colocation c_drbd_together inf:
>> >>>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>> >>>> ms_drbd_mount2:Master
>> >>>> order drbd_init_first inf: ms_drbd_vmstore:promote
>> >>>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start
>> >>>>
>> >>>> This doesn't seem to require that drbd be also running on the node
>> where
>> >>>> the ocf:linbit:drbd resources are slave (which it would need to do
>> to be
>> >>>> a DRBD SyncTarget) - how can I ensure that drbd is running
>> everywhere?
>> >>>> (clone cl_drbd p_drbd-init ?)
>> >>>
>> >>> This is really not needed.
>> >> I was following the official DRBD Users Guide:
>> >>
>> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html
>> >>
>> >> If I am understanding your previous message correctly, I do not need to
>> >> add a lsb primitive for the drbd daemon? It will be
>> >> started/stopped/managed automatically by my ocf:linbit:drbd resources
>> >> (and I can remove the /etc/rc* symlinks)?
>> >
>> > Yes, you don't need that LSB script when using Pacemaker and should not
>> > let init start it.
>> >
>> > Regards,
>> > Andreas
>> >
>> > --
>> > Need help with Pacemaker?
>> > http://www.hastexo.com/now
>> >
>> >>
>> >> Thanks,
>> >>
>> >> Andrew
>> >>
>> >>
>> ------------------------------------------------------------------------
>> >> *From: *"Andreas Kurz" <andreas at hastexo.com <mailto:
>> andreas at hastexo.com>>
>> >> *To: *pacemaker at oss.clusterlabs.org <mailto:
>> pacemaker at oss.clusterlabs.org>
>> >> *Sent: *Wednesday, March 28, 2012 7:27:34 AM
>> >> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> >> master on failover
>> >>
>> >> On 03/28/2012 12:13 AM, Andrew Martin wrote:
>> >>> Hi Andreas,
>> >>>
>> >>> Thanks, I've updated the colocation rule to be in the correct order. I
>> >>> also enabled the STONITH resource (this was temporarily disabled
>> before
>> >>> for some additional testing). DRBD has its own network connection over
>> >>> the br1 interface (192.168.5.0/24 network), a direct crossover cable
>> >>> between node1 and node2:
>> >>> global { usage-count no; }
>> >>> common {
>> >>>         syncer { rate 110M; }
>> >>> }
>> >>> resource vmstore {
>> >>>         protocol C;
>> >>>         startup {
>> >>>                 wfc-timeout  15;
>> >>>                 degr-wfc-timeout 60;
>> >>>         }
>> >>>         handlers {
>> >>>                 #fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t
>> 5";
>> >>>                 fence-peer "/usr/local/bin/fence-peer";
>> >>
>> >> hmm ... what is that fence-peer script doing? If you want to use
>> >> resource-level fencing with the help of dopd, activate the
>> >> drbd-peer-outdater script in the line above ... and double check if the
>> >> path is correct
>> >>
>> >>>                 split-brain "/usr/lib/drbd/notify-split-brain.sh
>> >>> me at example.com <mailto:me at example.com>";
>> >>>         }
>> >>>         net {
>> >>>                 after-sb-0pri discard-zero-changes;
>> >>>                 after-sb-1pri discard-secondary;
>> >>>                 after-sb-2pri disconnect;
>> >>>                 cram-hmac-alg md5;
>> >>>                 shared-secret "xxxxx";
>> >>>         }
>> >>>         disk {
>> >>>                 fencing resource-only;
>> >>>         }
>> >>>         on node1 {
>> >>>                 device /dev/drbd0;
>> >>>                 disk /dev/sdb1;
>> >>>                 address 192.168.5.10:7787;
>> >>>                 meta-disk internal;
>> >>>         }
>> >>>         on node2 {
>> >>>                 device /dev/drbd0;
>> >>>                 disk /dev/sdf1;
>> >>>                 address 192.168.5.11:7787;
>> >>>                 meta-disk internal;
>> >>>         }
>> >>> }
>> >>> # and similar for mount1 and mount2
>> >>>
>> >>> Also, here is my ha.cf. It uses both the direct link between the
>> nodes
>> >>> (br1) and the shared LAN network on br0 for communicating:
>> >>> autojoin none
>> >>> mcast br0 239.0.0.43 694 1 0
>> >>> bcast br1
>> >>> warntime 5
>> >>> deadtime 15
>> >>> initdead 60
>> >>> keepalive 2
>> >>> node node1
>> >>> node node2
>> >>> node quorumnode
>> >>> crm respawn
>> >>> respawn hacluster /usr/lib/heartbeat/dopd
>> >>> apiauth dopd gid=haclient uid=hacluster
>> >>>
>> >>> I am thinking of making the following changes to the CIB (as per the
>> >>> official DRBD
>> >>> guide
>> >>
>> >
>> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html)
>> in
>> >>> order to add the DRBD lsb service and require that it start before the
>> >>> ocf:linbit:drbd resources. Does this look correct?
>> >>
>> >> Where did you read that? No, deactivate the startup of DRBD on system
>> >> boot and let Pacemaker manage it completely.
>> >>
>> >>> primitive p_drbd-init lsb:drbd op monitor interval="30"
>> >>> colocation c_drbd_together inf:
>> >>> p_drbd-init ms_drbd_vmstore:Master ms_drbd_mount1:Master
>> >>> ms_drbd_mount2:Master
>> >>> order drbd_init_first inf: ms_drbd_vmstore:promote
>> >>> ms_drbd_mount1:promote ms_drbd_mount2:promote p_drbd-init:start
>> >>>
>> >>> This doesn't seem to require that drbd be also running on the node
>> where
>> >>> the ocf:linbit:drbd resources are slave (which it would need to do to
>> be
>> >>> a DRBD SyncTarget) - how can I ensure that drbd is running everywhere?
>> >>> (clone cl_drbd p_drbd-init ?)
>> >>
>> >> This is really not needed.
>> >>
>> >> Regards,
>> >> Andreas
>> >>
>> >> --
>> >> Need help with Pacemaker?
>> >> http://www.hastexo.com/now
>> >>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Andrew
>> >>>
>> ------------------------------------------------------------------------
>> >>> *From: *"Andreas Kurz" <andreas at hastexo.com <mailto:
>> andreas at hastexo.com>>
>> >>> *To: *pacemaker at oss.clusterlabs.org
>> > <mailto:*pacemaker at oss.clusterlabs.org>
>> >>> *Sent: *Monday, March 26, 2012 5:56:22 PM
>> >>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> >>> master on failover
>> >>>
>> >>> On 03/24/2012 08:15 PM, Andrew Martin wrote:
>> >>>> Hi Andreas,
>> >>>>
>> >>>> My complete cluster configuration is as follows:
>> >>>> ============
>> >>>> Last updated: Sat Mar 24 13:51:55 2012
>> >>>> Last change: Sat Mar 24 13:41:55 2012
>> >>>> Stack: Heartbeat
>> >>>> Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18) - partition
>> >>>> with quorum
>> >>>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
>> >>>> 3 Nodes configured, unknown expected votes
>> >>>> 19 Resources configured.
>> >>>> ============
>> >>>>
>> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE
>> > (standby)
>> >>>> Online: [ node2 node1 ]
>> >>>>
>> >>>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> >>>>      Masters: [ node2 ]
>> >>>>      Slaves: [ node1 ]
>> >>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>> >>>>      Masters: [ node2 ]
>> >>>>      Slaves: [ node1 ]
>> >>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>> >>>>      Masters: [ node2 ]
>> >>>>      Slaves: [ node1 ]
>> >>>>  Resource Group: g_vm
>> >>>>      p_fs_vmstore(ocf::heartbeat:Filesystem):Started node2
>> >>>>      p_vm(ocf::heartbeat:VirtualDomain):Started node2
>> >>>>  Clone Set: cl_daemons [g_daemons]
>> >>>>      Started: [ node2 node1 ]
>> >>>>      Stopped: [ g_daemons:2 ]
>> >>>>  Clone Set: cl_sysadmin_notify [p_sysadmin_notify]
>> >>>>      Started: [ node2 node1 ]
>> >>>>      Stopped: [ p_sysadmin_notify:2 ]
>> >>>>  stonith-node1(stonith:external/tripplitepdu):Started node2
>> >>>>  stonith-node2(stonith:external/tripplitepdu):Started node1
>> >>>>  Clone Set: cl_ping [p_ping]
>> >>>>      Started: [ node2 node1 ]
>> >>>>      Stopped: [ p_ping:2 ]
>> >>>>
>> >>>> node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \
>> >>>>         attributes standby="off"
>> >>>> node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \
>> >>>>         attributes standby="off"
>> >>>> node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4" quorumnode \
>> >>>>         attributes standby="on"
>> >>>> primitive p_drbd_mount2 ocf:linbit:drbd \
>> >>>>         params drbd_resource="mount2" \
>> >>>>         op monitor interval="15" role="Master" \
>> >>>>         op monitor interval="30" role="Slave"
>> >>>> primitive p_drbd_mount1 ocf:linbit:drbd \
>> >>>>         params drbd_resource="mount1" \
>> >>>>         op monitor interval="15" role="Master" \
>> >>>>         op monitor interval="30" role="Slave"
>> >>>> primitive p_drbd_vmstore ocf:linbit:drbd \
>> >>>>         params drbd_resource="vmstore" \
>> >>>>         op monitor interval="15" role="Master" \
>> >>>>         op monitor interval="30" role="Slave"
>> >>>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \
>> >>>>         params device="/dev/drbd0" directory="/vmstore"
>> fstype="ext4" \
>> >>>>         op start interval="0" timeout="60s" \
>> >>>>         op stop interval="0" timeout="60s" \
>> >>>>         op monitor interval="20s" timeout="40s"
>> >>>> primitive p_libvirt-bin upstart:libvirt-bin \
>> >>>>         op monitor interval="30"
>> >>>> primitive p_ping ocf:pacemaker:ping \
>> >>>>         params name="p_ping" host_list="192.168.1.10 192.168.1.11"
>> >>>> multiplier="1000" \
>> >>>>         op monitor interval="20s"
>> >>>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \
>> >>>>         params email="me at example.com <mailto:me at example.com>" \
>> >>>>         params subject="Pacemaker Change" \
>> >>>>         op start interval="0" timeout="10" \
>> >>>>         op stop interval="0" timeout="10" \
>> >>>>         op monitor interval="10" timeout="10"
>> >>>> primitive p_vm ocf:heartbeat:VirtualDomain \
>> >>>>         params config="/vmstore/config/vm.xml" \
>> >>>>         meta allow-migrate="false" \
>> >>>>         op start interval="0" timeout="120s" \
>> >>>>         op stop interval="0" timeout="120s" \
>> >>>>         op monitor interval="10" timeout="30"
>> >>>> primitive stonith-node1 stonith:external/tripplitepdu \
>> >>>>         params pdu_ipaddr="192.168.1.12" pdu_port="1"
>> pdu_username="xxx"
>> >>>> pdu_password="xxx" hostname_to_stonith="node1"
>> >>>> primitive stonith-node2 stonith:external/tripplitepdu \
>> >>>>         params pdu_ipaddr="192.168.1.12" pdu_port="2"
>> pdu_username="xxx"
>> >>>> pdu_password="xxx" hostname_to_stonith="node2"
>> >>>> group g_daemons p_libvirt-bin
>> >>>> group g_vm p_fs_vmstore p_vm
>> >>>> ms ms_drbd_mount2 p_drbd_mount2 \
>> >>>>         meta master-max="1" master-node-max="1" clone-max="2"
>> >>>> clone-node-max="1" notify="true"
>> >>>> ms ms_drbd_mount1 p_drbd_mount1 \
>> >>>>         meta master-max="1" master-node-max="1" clone-max="2"
>> >>>> clone-node-max="1" notify="true"
>> >>>> ms ms_drbd_vmstore p_drbd_vmstore \
>> >>>>         meta master-max="1" master-node-max="1" clone-max="2"
>> >>>> clone-node-max="1" notify="true"
>> >>>> clone cl_daemons g_daemons
>> >>>> clone cl_ping p_ping \
>> >>>>         meta interleave="true"
>> >>>> clone cl_sysadmin_notify p_sysadmin_notify
>> >>>> location l-st-node1 stonith-node1 -inf: node1
>> >>>> location l-st-node2 stonith-node2 -inf: node2
>> >>>> location l_run_on_most_connected p_vm \
>> >>>>         rule $id="l_run_on_most_connected-rule" p_ping: defined
>> p_ping
>> >>>> colocation c_drbd_libvirt_vm inf: ms_drbd_vmstore:Master
>> >>>> ms_drbd_mount1:Master ms_drbd_mount2:Master g_vm
>> >>>
>> >>> As Emmanuel already said, g_vm has to be in the first place in this
>> >>> collocation constraint .... g_vm must be colocated with the drbd
>> masters.
>> >>>
>> >>>> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote
>> ms_drbd_mount1:promote
>> >>>> ms_drbd_mount2:promote cl_daemons:start g_vm:start
>> >>>> property $id="cib-bootstrap-options" \
>> >>>>         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>> >>>>         cluster-infrastructure="Heartbeat" \
>> >>>>         stonith-enabled="false" \
>> >>>>         no-quorum-policy="stop" \
>> >>>>         last-lrm-refresh="1332539900" \
>> >>>>         cluster-recheck-interval="5m" \
>> >>>>         crmd-integration-timeout="3m" \
>> >>>>         shutdown-escalation="5m"
>> >>>>
>> >>>> The STONITH plugin is a custom plugin I wrote for the Tripp-Lite
>> >>>> PDUMH20ATNET that I'm using as the STONITH device:
>> >>>> http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf
>> >>>
>> >>> And why don't using it? .... stonith-enabled="false"
>> >>>
>> >>>>
>> >>>> As you can see, I left the DRBD service to be started by the
>> operating
>> >>>> system (as an lsb script at boot time) however Pacemaker controls
>> >>>> actually bringing up/taking down the individual DRBD devices.
>> >>>
>> >>> Don't start drbd on system boot, give Pacemaker the full control.
>> >>>
>> >>> The
>> >>>> behavior I observe is as follows: I issue "crm resource migrate
>> p_vm" on
>> >>>> node1 and failover successfully to node2. During this time, node2
>> fences
>> >>>> node1's DRBD devices (using dopd) and marks them as Outdated.
>> Meanwhile
>> >>>> node2's DRBD devices are UpToDate. I then shutdown both nodes and
>> then
>> >>>> bring them back up. They reconnect to the cluster (with quorum), and
>> >>>> node1's DRBD devices are still Outdated as expected and node2's DRBD
>> >>>> devices are still UpToDate, as expected. At this point, DRBD starts
>> on
>> >>>> both nodes, however node2 will not set DRBD as master:
>> >>>> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): OFFLINE
>> > (standby)
>> >>>> Online: [ node2 node1 ]
>> >>>>
>> >>>>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>> >>>>      Slaves: [ node1 node2 ]
>> >>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
>> >>>>      Slaves: [ node1 node 2 ]
>> >>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>> >>>>      Slaves: [ node1 node2 ]
>> >>>
>> >>> There should really be no interruption of the drbd replication on vm
>> >>> migration that activates the dopd ... drbd has its own direct network
>> >>> connection?
>> >>>
>> >>> Please share your ha.cf file and your drbd configuration. Watch out
>> for
>> >>> drbd messages in your kernel log file, that should give you additional
>> >>> information when/why the drbd connection was lost.
>> >>>
>> >>> Regards,
>> >>> Andreas
>> >>>
>> >>> --
>> >>> Need help with Pacemaker?
>> >>> http://www.hastexo.com/now
>> >>>
>> >>>>
>> >>>> I am having trouble sorting through the logging information because
>> >>>> there is so much of it in /var/log/daemon.log, but I can't  find an
>> >>>> error message printed about why it will not promote node2. At this
>> point
>> >>>> the DRBD devices are as follows:
>> >>>> node2: cstate = WFConnection dstate=UpToDate
>> >>>> node1: cstate = StandAlone dstate=Outdated
>> >>>>
>> >>>> I don't see any reason why node2 can't become DRBD master, or am I
>> >>>> missing something? If I do "drbdadm connect all" on node1, then the
>> >>>> cstate on both nodes changes to "Connected" and node2 immediately
>> >>>> promotes the DRBD resources to master. Any ideas on why I'm observing
>> >>>> this incorrect behavior?
>> >>>>
>> >>>> Any tips on how I can better filter through the pacemaker/heartbeat
>> logs
>> >>>> or how to get additional useful debug information?
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Andrew
>> >>>>
>> >>>>
>> ------------------------------------------------------------------------
>> >>>> *From: *"Andreas Kurz" <andreas at hastexo.com
>> > <mailto:andreas at hastexo.com>>
>> >>>> *To: *pacemaker at oss.clusterlabs.org
>> >> <mailto:*pacemaker at oss.clusterlabs.org>
>> >>>> *Sent: *Wednesday, 1 February, 2012 4:19:25 PM
>> >>>> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
>> >>>> master on failover
>> >>>>
>> >>>> On 01/25/2012 08:58 PM, Andrew Martin wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> Recently I finished configuring a two-node cluster with pacemaker
>> 1.1.6
>> >>>>> and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This cluster
>> > includes
>> >>>>> the following resources:
>> >>>>> - primitives for DRBD storage devices
>> >>>>> - primitives for mounting the filesystem on the DRBD storage
>> >>>>> - primitives for some mount binds
>> >>>>> - primitive for starting apache
>> >>>>> - primitives for starting samba and nfs servers (following
>> instructions
>> >>>>> here <http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf>)
>> >>>>> - primitives for exporting nfs shares (ocf:heartbeat:exportfs)
>> >>>>
>> >>>> not enough information ... please share at least your complete
>> cluster
>> >>>> configuration
>> >>>>
>> >>>> Regards,
>> >>>> Andreas
>> >>>>
>> >>>> --
>> >>>> Need help with Pacemaker?
>> >>>> http://www.hastexo.com/now
>> >>>>
>> >>>>>
>> >>>>> Perhaps this is best described through the output of crm_mon:
>> >>>>> Online: [ node1 node2 ]
>> >>>>>
>> >>>>>  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] (unmanaged)
>> >>>>>      p_drbd_mount1:0     (ocf::linbit:drbd):     Started node2
>> >>> (unmanaged)
>> >>>>>      p_drbd_mount1:1     (ocf::linbit:drbd):     Started node1
>> >>>>> (unmanaged) FAILED
>> >>>>>  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]
>> >>>>>      p_drbd_mount2:0       (ocf::linbit:drbd):     Master node1
>> >>>>> (unmanaged) FAILED
>> >>>>>      Slaves: [ node2 ]
>> >>>>>  Resource Group: g_core
>> >>>>>      p_fs_mount1 (ocf::heartbeat:Filesystem):    Started node1
>> >>>>>      p_fs_mount2   (ocf::heartbeat:Filesystem):    Started node1
>> >>>>>      p_ip_nfs   (ocf::heartbeat:IPaddr2):       Started node1
>> >>>>>  Resource Group: g_apache
>> >>>>>      p_fs_mountbind1    (ocf::heartbeat:Filesystem):    Started
>> node1
>> >>>>>      p_fs_mountbind2    (ocf::heartbeat:Filesystem):    Started
>> node1
>> >>>>>      p_fs_mountbind3    (ocf::heartbeat:Filesystem):    Started
>> node1
>> >>>>>      p_fs_varwww        (ocf::heartbeat:Filesystem):    Started
>> node1
>> >>>>>      p_apache   (ocf::heartbeat:apache):        Started node1
>> >>>>>  Resource Group: g_fileservers
>> >>>>>      p_lsb_smb  (lsb:smbd):     Started node1
>> >>>>>      p_lsb_nmb  (lsb:nmbd):     Started node1
>> >>>>>      p_lsb_nfsserver    (lsb:nfs-kernel-server):        Started
>> node1
>> >>>>>      p_exportfs_mount1   (ocf::heartbeat:exportfs):      Started
>> node1
>> >>>>>      p_exportfs_mount2     (ocf::heartbeat:exportfs):      Started
>> > node1
>> >>>>>
>> >>>>> I have read through the Pacemaker Explained
>> >>>>>
>> >>>>
>> >>>
>> > <
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained
>> >
>> >>>>> documentation, however could not find a way to further debug these
>> >>>>> problems. First, I put node1 into standby mode to attempt failover
>> to
>> >>>>> the other node (node2). Node2 appeared to start the transition to
>> >>>>> master, however it failed to promote the DRBD resources to master
>> (the
>> >>>>> first step). I have attached a copy of this session in commands.log
>> and
>> >>>>> additional excerpts from /var/log/syslog during important steps. I
>> have
>> >>>>> attempted everything I can think of to try and start the DRBD
>> resource
>> >>>>> (e.g. start/stop/promote/manage/cleanup under crm resource,
>> restarting
>> >>>>> heartbeat) but cannot bring it out of the slave state. However, if
>> > I set
>> >>>>> it to unmanaged and then run drbdadm primary all in the terminal,
>> >>>>> pacemaker is satisfied and continues starting the rest of the
>> > resources.
>> >>>>> It then failed when attempting to mount the filesystem for mount2,
>> the
>> >>>>> p_fs_mount2 resource. I attempted to mount the filesystem myself
>> > and was
>> >>>>> successful. I then unmounted it and ran cleanup on p_fs_mount2 and
>> then
>> >>>>> it mounted. The rest of the resources started as expected until the
>> >>>>> p_exportfs_mount2 resource, which failed as follows:
>> >>>>> p_exportfs_mount2     (ocf::heartbeat:exportfs):      started node2
>> >>>>> (unmanaged) FAILED
>> >>>>>
>> >>>>> I ran cleanup on this and it started, however when running this test
>> >>>>> earlier today no command could successfully start this exportfs
>> >> resource.
>> >>>>>
>> >>>>> How can I configure pacemaker to better resolve these problems and
>> be
>> >>>>> able to bring the node up successfully on its own? What can I check
>> to
>> >>>>> determine why these failures are occuring? /var/log/syslog did not
>> seem
>> >>>>> to contain very much useful information regarding why the failures
>> >>>> occurred.
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Andrew
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> This body part will be downloaded on demand.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> <mailto:Pacemaker at oss.clusterlabs.org>
>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>>
>> >>>> Project Home: http://www.clusterlabs.org
>> >>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>>> Bugs: http://bugs.clusterlabs.org
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> <mailto:Pacemaker at oss.clusterlabs.org>
>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>>
>> >>>> Project Home: http://www.clusterlabs.org
>> >>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>>> Bugs: http://bugs.clusterlabs.org
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> <mailto:Pacemaker at oss.clusterlabs.org>
>> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>
>> >>> Project Home: http://www.clusterlabs.org
>> >>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>> Bugs: http://bugs.clusterlabs.org
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> <mailto:Pacemaker at oss.clusterlabs.org>
>> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>
>> >>> Project Home: http://www.clusterlabs.org
>> >>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>> Bugs: http://bugs.clusterlabs.org
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> <mailto:Pacemaker at oss.clusterlabs.org>
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120330/fcd3450f/attachment-0003.html>


More information about the Pacemaker mailing list