[ClusterLabs] Antw: CRM managing ADSL connection; failure not handled

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Aug 25 04:34:59 EDT 2015


Why not start with writing a real OCF RA?

>>> Tom Yates <madhatter at teaparty.net> schrieb am 24.08.2015 um 11:35 in Nachricht
<alpine.LFD.2.20.1508240951170.22953 at risby.home.teaparty.net>:
> I've got a failover firewall pair where the external interface is ADSL; 
> that is, PPPoE.  i've defined the service thus:
> 
> primitive ExternalIP lsb:hb-adsl-helper \
>          op monitor interval="60s"
> 
> and in addition written a noddy script /etc/init.d/hb-adsl-helper, thus:
> 
> #!/bin/bash
> RETVAL=0
> start() {
>          /sbin/pppoe-start
> }
> stop() {
>          /sbin/pppoe-stop
> }
> case "$1" in
>    start)
>          start
>          ;;
>    stop)
>          stop
>          ;;
>    status)
>          /sbin/ifconfig ppp0 >& /dev/null && exit 0
>          exit 1
>          ;;
>    *)
>          echo $"Usage: $0 {start|stop|status}"
>          exit 3
> esac
> exit $?
> 
> The problem is that sometimes the ADSL connection falls over, as they do, 
> eg:
> 
> Aug 20 11:42:10 positron pppd[2469]: LCP terminated by peer
> Aug 20 11:42:10 positron pppd[2469]: Connect time 8619.4 minutes.
> Aug 20 11:42:10 positron pppd[2469]: Sent 1342528799 bytes, received 
> 164420300 bytes.
> Aug 20 11:42:13 positron pppd[2469]: Connection terminated.
> Aug 20 11:42:13 positron pppd[2469]: Modem hangup
> Aug 20 11:42:13 positron pppoe[2470]: read (asyncReadFromPPP): Session 1735: 
> Input/output error
> Aug 20 11:42:13 positron pppoe[2470]: Sent PADT
> Aug 20 11:42:13 positron pppd[2469]: Exit.
> Aug 20 11:42:13 positron pppoe-connect: PPPoE connection lost; attempting 
> re-connection.
> 
> CRMd then logs a bunch of stuff, followed by
> 
> Aug 20 11:42:18 positron lrmd: [1760]: info: rsc:ExternalIP:8: stop
> Aug 20 11:42:18 positron lrmd: [28357]: WARN: For LSB init script, no 
> additional parameters are needed.
> [...]
> Aug 20 11:42:18 positron pppoe-stop: Killing pppd
> Aug 20 11:42:18 positron pppoe-stop: Killing pppoe-connect
> Aug 20 11:42:18 positron lrmd: [1760]: WARN: Managed ExternalIP:stop process 
> 28357 exited with return code 1.
> 
> 
> At this point, the PPPoE connection is down, and stays down.  CRMd doesn't 
> fail the group which contains both internal and external interfaces over 
> to the other node, but nor does it try to restart the service.  I'm fairly 
> sure this is because I've done something boneheaded, but I can't get my 
> bone head around what it might be.
> 
> Any light anyone can shed is much appreciated.
> 
> 
> -- 
> 
>        Tom Yates  -  http://www.teaparty.net 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 








More information about the Users mailing list