[Pacemaker] Known problem with IPaddr(2)

Lars Ellenberg lars.ellenberg at linbit.com
Tue Apr 13 14:28:09 EDT 2010


On Tue, Apr 13, 2010 at 12:10:18PM +0200, Dejan Muhamedagic wrote:
> Hi,
> 
> On Mon, Apr 12, 2010 at 05:26:19PM +0200, Markus M. wrote:
> > Markus M. wrote:
> > >is there a known problem with IPaddr(2) when defining many (in my
> > >case: 11) ip resources which are started/stopped concurrently?
> 
> Don't remember any problems.
> 
> > Well... some further investigation revealed that it seems to be a
> > problem with the way how the ip addresses are assigned.
> > 
> > When looking at the output of "ip addr", the first ip address added
> > to the interface gets the scope "global", all further aliases gets
> > the scope "global secondary".
> > 
> > If afterwards the first ip address is removed before the secondaries
> > (due to concurrently run of the scripts), ALL secondaries are
> > removed at the same time by the "ip" command, leading to an error
> > for all subsequent trials to remove the other ip addresses because
> > they are already gone.
> > 
> > I am not sure how "ip" decides for the "secondary" scope, maybe
> > beacuse the other ip addresses are in the same subnet as the first
> > one.
> 
> That sounds bad. Instances should be independent of each other.
> Can you please open a bugzilla and attach a hb_report.

Oh, that is perfectly expected the way he describes it.
The assumption has always been that there is at least one
"normal", not managed by crm, address on the interface,
so no one will have noticed before.

I suggest the following patch,
basically doing one retry.

For the described scenario,
the second try will find the IP already "non existant",
and exit $OCF_SUCCESS.

diff -r e39d40853f09 heartbeat/IPaddr2
--- a/heartbeat/IPaddr2	Tue Apr 13 19:23:05 2010 +0200
+++ b/heartbeat/IPaddr2	Tue Apr 13 20:27:06 2010 +0200
@@ -684,12 +684,12 @@
 
 	if [ $ip_status = "no" ]; then
 		: Requested interface not in use
-		exit $OCF_SUCCESS
+		return $OCF_SUCCESS
 	fi
 
 	if [ -n "$IP_CIP" ] && [ $ip_status != "partial2" ]; then
 		if [ $ip_status = "partial" ]; then
-			exit $OCF_SUCCESS
+			return $OCF_SUCCESS
 		fi
 		echo "-$IP_INC_NO" >$IP_CIP_FILE
 		if [ "x$(cat $IP_CIP_FILE)" = "x" ]; then
@@ -713,7 +713,7 @@
 	if [ "$ip_del_if" = "yes" ]; then
 		delete_interface $BASEIP $NIC $NETMASK
 		if [ $? -ne 0 ]; then
-			exit $OCF_ERR_GENERIC
+			return $OCF_ERR_GENERIC
 		fi
 	
 		if [ "$LVS_SUPPORT" = 1 ]; then
@@ -721,7 +721,7 @@
 		fi
 	fi
 
-	exit $OCF_SUCCESS
+	return $OCF_SUCCESS
 }
 
 ip_monitor() {
@@ -828,7 +828,12 @@
 case $__OCF_ACTION in
 start)		ip_start
 		;;
-stop)		ip_stop
+stop)	
+		# do one retry
+		ip_stop || ip_stop
+		# neither explicit exit nor explicit $? needed.
+		# but for good measure and readability:
+		exit $?
 		;;
 status)		ip_status=`ip_served`
 		if [ $ip_status = "ok" ]; then
-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.




More information about the Pacemaker mailing list