[ClusterLabs] CRM managing ADSL connection; failure not handled

Andrei Borzenkov arvidjaar at gmail.com
Mon Aug 24 09:52:25 UTC 2015


24.08.2015 12:35, Tom Yates пишет:
> I've got a failover firewall pair where the external interface is ADSL;
> that is, PPPoE.  i've defined the service thus:
>
> primitive ExternalIP lsb:hb-adsl-helper \
>          op monitor interval="60s"
>
> and in addition written a noddy script /etc/init.d/hb-adsl-helper, thus:
>
> #!/bin/bash
> RETVAL=0
> start() {
>          /sbin/pppoe-start
> }
> stop() {
>          /sbin/pppoe-stop
> }
> case "$1" in
>    start)
>          start
>          ;;
>    stop)
>          stop
>          ;;
>    status)
>          /sbin/ifconfig ppp0 >& /dev/null && exit 0
>          exit 1
>          ;;
>    *)
>          echo $"Usage: $0 {start|stop|status}"
>          exit 3
> esac
> exit $?
>
> The problem is that sometimes the ADSL connection falls over, as they
> do, eg:
>
> Aug 20 11:42:10 positron pppd[2469]: LCP terminated by peer
> Aug 20 11:42:10 positron pppd[2469]: Connect time 8619.4 minutes.
> Aug 20 11:42:10 positron pppd[2469]: Sent 1342528799 bytes, received
> 164420300 bytes.
> Aug 20 11:42:13 positron pppd[2469]: Connection terminated.
> Aug 20 11:42:13 positron pppd[2469]: Modem hangup
> Aug 20 11:42:13 positron pppoe[2470]: read (asyncReadFromPPP): Session
> 1735: Input/output error
> Aug 20 11:42:13 positron pppoe[2470]: Sent PADT
> Aug 20 11:42:13 positron pppd[2469]: Exit.
> Aug 20 11:42:13 positron pppoe-connect: PPPoE connection lost;
> attempting re-connection.
>
> CRMd then logs a bunch of stuff, followed by
>
> Aug 20 11:42:18 positron lrmd: [1760]: info: rsc:ExternalIP:8: stop
> Aug 20 11:42:18 positron lrmd: [28357]: WARN: For LSB init script, no
> additional parameters are needed.
> [...]
> Aug 20 11:42:18 positron pppoe-stop: Killing pppd
> Aug 20 11:42:18 positron pppoe-stop: Killing pppoe-connect
> Aug 20 11:42:18 positron lrmd: [1760]: WARN: Managed ExternalIP:stop
> process 28357 exited with return code 1.
>
>
> At this point, the PPPoE connection is down, and stays down.  CRMd
> doesn't fail the group which contains both internal and external
> interfaces over to the other node, but nor does it try to restart the
> service.  I'm fairly sure this is because I've done something
> boneheaded, but I can't get my bone head around what it might be.
>
> Any light anyone can shed is much appreciated.
>
>

If stop operation failed resource state is undefined; pacemaker won't do 
anything with this resource. Either make sure script returns success 
when appropriate or the only option is to make it fence node where 
resource was active.





More information about the Users mailing list