<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 03/14/2013 03:36 PM, Arnold Krille wrote:<br>

    <span style="white-space: pre;">> On Thu, 14 Mar 2013 14:06:36

      +0000 Owen Le Blanc <a class="moz-txt-link-rfc2396E" href="mailto:LeBlanc@man.ac.uk"><LeBlanc@man.ac.uk></a><br>

      > wrote:<br>

      >> I have a number of pacemaker managed clusters. We use an

      independent<br>

      >> heartbeat network for corosync, and we use another

      network for the<br>

      >> managed services. The heartbeat network is routed using

      different<br>

      >> hardware from the service network. We have two machine

      rooms, and<br>

      >> our normal pacemaker clusters have one node in each

      machine room.<br>

      >><br>

      >> In the past I've used ocf:pacemaker:ping as part of our<br>

      >> configurations, but we had problems, since our network is

      busy, and<br>

      >> many of the routers (the most reliable things to ping)

      are configured<br>

      >> to ignore pings when they have too much to do otherwise.

      In this way<br>

      >> we often had false connectivity failures in the past, and

      services<br>

      >> would flop from one side to the other.<br>

      >><br>

      >> Recently we had a power failure which affected all of the

      switches on<br>

      >> our service network in one machine room. This meant that

      all<br>

      >> services in that machine room were unavailable. Our

      pacemaker<br>

      >> clusters unfortunately saw this as no problem, since

      without a ping<br>

      >> test, they couldn't tell that the network was down.<br>

      >><br>

      >> Has anyone done any work to measure network connectivity

      in<br>

      >> connection with pacemaker without using ping? I can see a

      couple of<br>

      >> potential ways to avoid it, but I hate to reinvent

      wheels.<br>

      ><br>

      > I have seen a commercial (but pacemaker-based) solution that

      seemed to<br>

      > use link-detection on the hw-level to suicide the local node

      when both<br>

      > links (one to the outside and one to the peer) went down.<br>

      ><br>

      > But I don't even know if this was done inside pacemaker, nor

      did I have<br>

      > time to think about something similar for our cluster.<br>

      ><br>

      > I just trust that four links using two switches with

      independant power<br>

      > will be safe enough...</span><br>

    I've done a suicide when the link goes away by looking at

    /sys/class/net//<interface>//carrier<br>

    <br>

    for example, cat /sys/class/net/eth0/carrier and see what it looks

    like...<br>

    <br>

    It's 1 when the link is up, and 0 when it's down.  You could

    presumably write a script that uses that to set node attributes

    too...<br>

    <br>

    -- <br>

        Alan Robertson <a class="moz-txt-link-rfc2396E" href="mailto:alanr@unix.sh"><alanr@unix.sh></a> - @OSSAlanR<br>

    <br>

    "Openness is the foundation and preservative of friendship...  Let

    me claim from you at all times your undisguised opinions." - William

    Wilberforce<br>

    <br>

  </body>

</html>