<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

Pavlos Parissis wrote:

<blockquote

 cite="mid:AANLkTikmxP+xbxLyGY_1VEtGL+ykGrcm5YKCY7+dSJgf@mail.gmail.com"

 type="cite">

  <pre wrap="">On 13 October 2010 09:48, Dan Frincu <a class="moz-txt-link-rfc2396E" href="mailto:dfrincu@streamwide.ro"><dfrincu@streamwide.ro></a> wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">Hi,

I've noticed the same type of behavior, however in a different context, my

setup includes 3 drbd devices and a group of resources, all have to run on

the same node and move together to other nodes. My issue was with the first

resource that required access to a drbd device, which was the

ocf:heartbeat:Filesystem RA trying to do a mount and failing.

The reason, it was trying to do the mount of the drbd device before the drbd

device had finished migrating to primary state. Same as you, I introduced a

start-delay, but on the start action. This proved to be of no use as the

behavior persisted, even with an increased start-delay. However, it only

happened when performing a fail-back operation, during fail-over, everything

was ok, during fail-back, error.

The fix I've made was to remove any start-delay and to add group collocation

constraints to all ms_drbd resources. Before that I only had one collocation

constraint for the drbd device being promoted last.

I hope this helps.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I am glad that somebody else experienced the same issue:)

On my mail I was talking about the monitor action which was failing,

but the behavior you described happened on my system on the same

setup, drbd and fs resource.It also happened on the application

resource, the start was too fast and the FS was not mounted (yet) when

the action start fired for the application resource. A delay on start

function of the resource agent of the application fixed my issue.

In my setup I have all the necessary constraints to avoid this, at

least this is what I believe so:-)

Cheers,

Pavlos

  </pre>

</blockquote>

>From what I see you have a dual primary setup with failover on the

third node, basically if you have one drbd resource for which you have

both ordering and collocation, I don't think you need to "improve" it,

if it ain't broke, don't fix it :)<br>

<br>

Regards,<br>

<br>

Dan<br>

<blockquote

 cite="mid:AANLkTikmxP+xbxLyGY_1VEtGL+ykGrcm5YKCY7+dSJgf@mail.gmail.com"

 type="cite">

  <pre wrap="">

[root@node-01 sysconfig]# crm configure show

node $id="059313ce-c6aa-4bd5-a4fb-4b781de6d98f" node-03

node $id="d791b1f5-9522-4c84-a66f-cd3d4e476b38" node-02

node $id="e388e797-21f4-4bbe-a588-93d12964b4d7" node-01

primitive drbd_01 ocf:linbit:drbd \

        params drbd_resource="drbd_pbx_service_1" \

        op monitor interval="30s" \

        op start interval="0" timeout="240s" \

        op stop interval="0" timeout="120s"

primitive drbd_02 ocf:linbit:drbd \

        params drbd_resource="drbd_pbx_service_2" \

        op monitor interval="30s" \

        op start interval="0" timeout="240s" \

        op stop interval="0" timeout="120s"

primitive fs_01 ocf:heartbeat:Filesystem \

        params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \

        meta migration-threshold="3" failure-timeout="60" \

        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \

        op start interval="0" timeout="60s" \

        op stop interval="0" timeout="60s"

primitive fs_02 ocf:heartbeat:Filesystem \

        params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" \

        meta migration-threshold="3" failure-timeout="60" \

        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \

        op start interval="0" timeout="60s" \

        op stop interval="0" timeout="60s"

primitive ip_01 ocf:heartbeat:IPaddr2 \

        params ip="192.168.78.10" cidr_netmask="24" broadcast="192.168.78.255" \

        meta failure-timeout="120" migration-threshold="3" \

        op monitor interval="5s"

primitive ip_02 ocf:heartbeat:IPaddr2 \

        meta failure-timeout="120" migration-threshold="3" \

        params ip="192.168.78.20" cidr_netmask="24" broadcast="192.168.78.255" \

        op monitor interval="5s"

primitive pbx_01 lsb:znd-pbx_01 \

        meta migration-threshold="3" failure-timeout="60"

target-role="Started" \

        op monitor interval="20s" timeout="20s" \

        op start interval="0" timeout="60s" \

        op stop interval="0" timeout="60s"

primitive pbx_02 lsb:znd-pbx_02 \

        meta migration-threshold="3" failure-timeout="60" \

        op monitor interval="20s" timeout="20s" \

        op start interval="0" timeout="60s" \

        op stop interval="0" timeout="60s"

primitive sshd_01 lsb:znd-sshd-pbx_01 \

        meta target-role="Started" is-managed="true" \

        op monitor on-fail="stop" interval="10m" \

        op start interval="0" timeout="60s" on-fail="stop" \

        op stop interval="0" timeout="60s" on-fail="stop"

primitive sshd_02 lsb:znd-sshd-pbx_02 \

        meta target-role="Started" \

        op monitor on-fail="stop" interval="10m" \

        op start interval="0" timeout="60s" on-fail="stop" \

        op stop interval="0" timeout="60s" on-fail="stop"

group pbx_service_01 ip_01 fs_01 pbx_01 sshd_01 \

        meta target-role="Started"

group pbx_service_02 ip_02 fs_02 pbx_02 sshd_02

ms ms-drbd_01 drbd_01 \

        meta master-max="1" master-node-max="1" clone-max="2"

clone-node-max="1" notify="true" target-role="Started"

ms ms-drbd_02 drbd_02 \

        meta master-max="1" master-node-max="1" clone-max="2"

clone-node-max="1" notify="true" target-role="Started"

location PrimaryNode-drbd_01 ms-drbd_01 100: node-01

location PrimaryNode-drbd_02 ms-drbd_02 100: node-02

location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01

location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02

location SecondaryNode-drbd_01 ms-drbd_01 0: node-03

location SecondaryNode-drbd_02 ms-drbd_02 0: node-03

location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03

location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03

colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master

colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master

order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start

order pbx_service_02-after-drbd_02 inf: ms-drbd_02:promote pbx_service_02:start

property $id="cib-bootstrap-options" \

        dc-version="1.1.3-9c2342c0378140df9bed7d192f2b9ed157908007" \

        cluster-infrastructure="Heartbeat" \

        symmetric-cluster="false" \

        stonith-enabled="false" \

        last-lrm-refresh="1286895296"

rsc_defaults $id="rsc-options" \

        resource-stickiness="1000"

_______________________________________________

Pacemaker mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>

<a class="moz-txt-link-freetext" href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>

Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>

Bugs: <a class="moz-txt-link-freetext" href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a>

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

Dan FRINCU

Systems Engineer

CCNA, RHCE

Streamwide Romania

</pre>

</body>

</html>