[ClusterLabs] CRM managing ADSL connection; failure not handled

Tom Yates madhatter at teaparty.net
Mon Aug 24 05:35:37 EDT 2015


I've got a failover firewall pair where the external interface is ADSL; 
that is, PPPoE.  i've defined the service thus:

primitive ExternalIP lsb:hb-adsl-helper \
         op monitor interval="60s"

and in addition written a noddy script /etc/init.d/hb-adsl-helper, thus:

#!/bin/bash
RETVAL=0
start() {
         /sbin/pppoe-start
}
stop() {
         /sbin/pppoe-stop
}
case "$1" in
   start)
         start
         ;;
   stop)
         stop
         ;;
   status)
         /sbin/ifconfig ppp0 >& /dev/null && exit 0
         exit 1
         ;;
   *)
         echo $"Usage: $0 {start|stop|status}"
         exit 3
esac
exit $?

The problem is that sometimes the ADSL connection falls over, as they do, 
eg:

Aug 20 11:42:10 positron pppd[2469]: LCP terminated by peer
Aug 20 11:42:10 positron pppd[2469]: Connect time 8619.4 minutes.
Aug 20 11:42:10 positron pppd[2469]: Sent 1342528799 bytes, received 164420300 bytes.
Aug 20 11:42:13 positron pppd[2469]: Connection terminated.
Aug 20 11:42:13 positron pppd[2469]: Modem hangup
Aug 20 11:42:13 positron pppoe[2470]: read (asyncReadFromPPP): Session 1735: Input/output error
Aug 20 11:42:13 positron pppoe[2470]: Sent PADT
Aug 20 11:42:13 positron pppd[2469]: Exit.
Aug 20 11:42:13 positron pppoe-connect: PPPoE connection lost; attempting re-connection.

CRMd then logs a bunch of stuff, followed by

Aug 20 11:42:18 positron lrmd: [1760]: info: rsc:ExternalIP:8: stop
Aug 20 11:42:18 positron lrmd: [28357]: WARN: For LSB init script, no additional parameters are needed.
[...]
Aug 20 11:42:18 positron pppoe-stop: Killing pppd
Aug 20 11:42:18 positron pppoe-stop: Killing pppoe-connect
Aug 20 11:42:18 positron lrmd: [1760]: WARN: Managed ExternalIP:stop process 28357 exited with return code 1.


At this point, the PPPoE connection is down, and stays down.  CRMd doesn't 
fail the group which contains both internal and external interfaces over 
to the other node, but nor does it try to restart the service.  I'm fairly 
sure this is because I've done something boneheaded, but I can't get my 
bone head around what it might be.

Any light anyone can shed is much appreciated.


-- 

       Tom Yates  -  http://www.teaparty.net




More information about the Users mailing list