[Pacemaker] Failover constraint problem

Mon Apr 19 08:01:19 UTC 2010

Hi Sandor,

> 1.  If I migrate apache-group resorce to another node then nfs_client
> won't release the /mnt mount point (I know according to this config it
> should not).
Refering  to teh time-out message below, is it possible that stopping the 
nfs-client takes more than 20 sec?
Perhaps you should try to give your resources start/stop attributes: 
        op start interval="0" timeout="1m" \
        op stop interval="0" timeout="1m" \

> 2. If I shot down node1 (suppose that node0 the master at the moment and
> runs apache-group) then nothing happens as expected but if node1 comes
> online again the apache-group start to migrate to node1. I don't
> understand why because there is a constraint for this to get
> apache-group run on node which primary drbd resource and in this
> situation node0 is.

I had this phenomenon when an ocf does not return the right errorcodes for 
some reasons, e.g. apache monitor throws an error, because of errors in 
thttpd.conf.
Perhaps it helps when you try to start/stop apache manually on that node?

HTH,
Martin

Sandor Feher <sfeher at bluesystem.hu> wrote on 17.04.2010 10:29:31:

> [image removed] 
> 
> [Pacemaker] Failover constraint problem
> 
> Sandor Feher 
> 
> to:
> 
> pacemaker
> 
> 17.04.2010 10:31
> 
> Please respond to The Pacemaker cluster resource manager 
> 
> Hi,
> 
> First of all my goal is to set up a two-node cluster with pacemaker to
> serve our webhosting service.
> This config sites on two vmware virtual machines for testing purposes
> now. Both of them runs Debian Lenny.
> 
> Here are the basic rules I set up:
> 
> node0  has
> 
> virtual ip
> drbd primary filesystem mounted under /mnt
> nfs server offers /mnt mount point to node1
> 
> node1 has
> 
> drbd secondary node
> nfs_client mounts node0's /mnt dir and it should be rw for both nodes
> 
> If  node0 fails then node1 will act as primary drbd node, take over
> virtual ip and mount drbd partition under /mnt dir and will not start
> nfs_client resource because it makes no sense (nfs_client should be take
> down before drbd partition get mounted under /mnt).
> If node1 fails the nothing should be happen because nfs_client only runs
> node which has secondary drbd partition
> 
> So my problems are the following.
> 
> 1.  If I migrate apache-group resorce to another node then nfs_client
> won't release the /mnt mount point (I know according to this config it
> should not).
>       I think I need some clever constraint to achieve this.
> 
> 2. If I shot down node1 (suppose that node0 the master at the moment and
> runs apache-group) then nothing happens as expected but if node1 comes
> online again the apache-group start to migrate to node1. I don't
> understand why because there is a constraint for this to get
> apache-group run on node which primary drbd resource and in this
> situation node0 is.
> 
> 
> crm configure show
> 
> node node0 \
>          attributes standby="off"
> node node1 \
>          attributes standby="off"
> primitive drbd0 ocf:heartbeat:drbd \
>          params drbd_resource="r0" \
>          op monitor interval="59s" role="Master" timeout="30s" \
>          op monitor interval="60s" role="Slave" timeout="30s"
> primitive fs0 ocf:heartbeat:Filesystem \
>          params fstype="ext3" directory="/mnt" device="/dev/drbd0" \
>          meta target-role="Started"
> primitive nfs_client ocf:heartbeat:Filesystem \
>          params fstype="nfs" directory="/mnt/"
> device="192.168.1.40:/mnt/"
> options="hard,intr,noatime,rw,nolock,tcp,timeo=50" \
>          meta target-role="Stopped"
> primitive nfs_server lsb:nfs-kernel-server \
>          op monitor interval="1min"
> primitive virtual-ip ocf:heartbeat:IPaddr2 \
>          params ip="192.168.1.40" broadcast="192.168.1.255" nic="eth0"
> cidr_netmask="24" \
>          op monitor interval="21s" timeout="5s" target-role="Started"
> group apache-group fs0 virtual-ip nfs_server \
>          meta target-role="Started"
> ms ms-drbd0 drbd0 \
>          meta clone-max="2" notify="true" globally-unique="false"
> target-role="Started"
> location cli-prefer-apache-group apache-group \
>          rule $id="cli-prefer-rule-apache-group" inf: #uname eq node0
> colocation apache-group-on-ms-drbd0 inf: apache-group ms-drbd0:Master
> colocation co_nfs_client inf: nfs_client ms-drbd0:Slave
> order ms-drbd0-before-apache-group inf: ms-drbd0:promote 
apache-group:start
> order ms-drbd0-before-nfs_client inf: ms-drbd0:promote nfs_client:start
> property $id="cib-bootstrap-options" \
>          dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
>          cluster-infrastructure="openais" \
>          stonith-enabled="false" \
>          no-quorum-policy="ignore" \
>          expected-quorum-votes="2" \
>          last-lrm-refresh="1271453094"
> 
> node1:~# crm_mon -1
> ============
> Last updated: Fri Apr 16 23:49:30 2010
> Stack: openais
> Current DC: node0 - partition with quorum
> Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> ============
> 
> Online: [ node0 node1 ]
> 
>   Resource Group: apache-group
>       fs0        (ocf::heartbeat:Filesystem):    Started node1
> (unmanaged) FAILED
>       virtual-ip (ocf::heartbeat:IPaddr2):       Stopped
>       nfs_server (lsb:nfs-kernel-server):        Stopped
>   Master/Slave Set: ms-drbd0
>       Masters: [ node0 ]
>       Slaves: [ node1 ]
>   nfs_client     (ocf::heartbeat:Filesystem):    Started node1
> (unmanaged) FAILED
> 
> Failed actions:
>      nfs_client_start_0 (node=node0, call=98, rc=1, status=complete):
> unknown error
>      fs0_stop_0 (node=node1, call=9, rc=-2, status=Timed Out): unknown
> exec error
>      nfs_client_stop_0 (node=node1, call=7, rc=-2, status=Timed Out):
> unknown exec error
> 
> 
> I really appreciate any idea. Thank you in advance.
> 
> Regards,   Sandor
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

InterComponentWare AG:  
Vorstand: Peter Kirschbauer (Vors.), Jörg Stadler / Aufsichtsratsvors.: Prof. Dr. Christof Hettich  
Firmensitz: 69190 Walldorf, Industriestraße 41 / AG Mannheim HRB 351761 / USt.-IdNr.: DE 198388516