<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi!<br>

    <br>

    I had moved my cluster from heartbeat to corosync.<br>

    Here corosync.conf  content:<br>

    <br>

    compatibility: whitetank<br>

    <br>

    totem {<br>

            version: 2<br>

            token: 500<br>

            downcheck: 500<br>

            secauth: off<br>

            threads: 0<br>

            interface {<br>

                    ringnumber: 0<br>

                    bindnetaddr: 10.10.1.0<br>

                    mcastaddr: 226.94.1.1<br>

                    mcastport: 5405<br>

            }<br>

    }<br>

    <br>

    logging {<br>

            fileline: off<br>

            to_stderr: no<br>

            to_logfile: yes<br>

            to_syslog: yes<br>

            logfile: /var/log/corosync.log<br>

            debug: on<br>

            timestamp: on<br>

            logger_subsys {<br>

                    subsys: AMF<br>

                    debug: off<br>

            }<br>

    }<br>

    <br>

    amf {<br>

            mode: disabled<br>

    }<br>

    <br>

    quorum {<br>

            provider: corosync_votequorum<br>

            expected_votes: 1<br>

    }<br>

    <br>

    Pacemaker <span id="result_box" class="short_text" lang="en"><span

        class="hps">configuration is not</span> <span class="hps">changed.<br>

        <br>

        After </span></span><span id="result_box" class="short_text"

      lang="en"><span class="hps"> first node </span></span><span

      id="result_box" class="short_text" lang="en"><span class="hps">crashed

        in corosync.log I can see that monitoring stoped at 15:15:24

        (i.e. node crashed at </span></span><span id="result_box"

      class="short_text" lang="en"><span class="hps">15:15:24</span></span><span

      id="result_box" class="short_text" lang="en"><span class="hps">):<br>

        <br>

        Jul 19 15:53:22 freeswitch1 lrmd: [24569]: debug:

        rsc:FailoverIP2:12: monitor<br>

        Jul 19 15:53:22 freeswitch1 lrmd: [24569]: debug:

        rsc:FailoverIP1:10: monitor<br>

        Jul 19 15:53:23 freeswitch1 lrmd: [24569]: debug:

        rsc:FailoverIP3:14: monitor<br>

        Jul 19 15:53:23 freeswitch1 lrmd: [24569]: debug: rsc:fs:16:

        monitor<br>

        Jul 19 15:53:23 freeswitch1 lrmd: [24569]: debug: RA output:

        (fs:monitor:stdout) OK<br>

        Jul 19 15:53:23 freeswitch1 lrmd: [24569]: debug:

        rsc:FailoverIP2:12: monitor<br>

        Jul 19 15:53:23 freeswitch1 lrmd: [24569]: debug:

        rsc:FailoverIP1:10: monitor<br>

        Jul 19 <b>15:53:24</b> freeswitch1 lrmd: [24569]: debug:

        rsc:FailoverIP3:14: monitor<br>

        Jul 19 15:55:00 corosync [MAIN  ] Corosync Cluster Engine

        ('1.2.7'): started and ready to provide service.<br>

        Jul 19 15:55:00 corosync [MAIN  ] Corosync built-in features:

        nss rdma<br>

        <br>

        On second node in corosync.log:<br>

        <br>

        Jul 19 <b>15:53:27</b> corosync [TOTEM ] The token was lost in

        the OPERATIONAL state.<br>

        Jul 19 15:53:27 corosync [TOTEM ] A processor failed, forming

        new configuration.<br>

        Jul 19 15:53:27 corosync [TOTEM ] Receive multicast socket recv

        buffer size (262142 bytes).<br>

        Jul 19 15:53:27 corosync [TOTEM ] Transmit multicast socket send

        buffer size (262142 bytes).<br>

        Jul 19 15:53:27 corosync [TOTEM ] entering GATHER state from 2.<br>

        Jul 19 15:53:28 corosync [TOTEM ] entering GATHER state from 0.<br>

        <br>

        I.e. second node detected crash after 3 secs.<br>

        <br>

      </span></span>Is there any way to reduce this amount of time? <br>

    <br>

    Thanks in advance for all yours hints.<br>

    <br>

    <div class="moz-cite-prefix">12.07.2012 10:47, Виталий Давудов

      пишет:<br>

    </div>

    <blockquote cite="mid:4FFE72EA.8090504@vts24.ru" type="cite">David,

      thanks for your answer!

      <br>

      <br>

      I'll try to migrate to corosync.

      <br>

      <br>

      11.07.2012 22:40, David Vossel пишет:

      <br>

      <blockquote type="cite">

        <br>

        ----- Original Message -----

        <br>

        <blockquote type="cite">From: "Виталий Давудов"

          <a class="moz-txt-link-rfc2396E" href="mailto:vitaliy.davudov@vts24.ru"><vitaliy.davudov@vts24.ru></a>

          <br>

          To: <a class="moz-txt-link-abbreviated" href="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</a>

          <br>

          Sent: Wednesday, July 11, 2012 7:34:08 AM

          <br>

          Subject: [Pacemaker] Pengine behavior

          <br>

          <br>

          <br>

          Hi, list!

          <br>

          <br>

          I have configured cluster for voip application.

          <br>

          Here my configuration:

          <br>

          <br>

          # crm configure show

          <br>

          node $id="552f91eb-e70a-40a5-ac43-cb16e063fdba" freeswitch1 \

          <br>

          attributes standby="off"

          <br>

        </blockquote>

        Ah... right here is your problem. You are using freeswitch

        instead of Asterisk :P

        <br>

        <br>

        <blockquote type="cite">node

          $id="c86ab64d-26c4-4595-aa32-bf9d18f714e7" freeswitch2 \

          <br>

          attributes standby="off"

          <br>

          primitive FailoverIP1 ocf:heartbeat:IPaddr2 \

          <br>

          params iflabel="FoIP1" ip="91.211.219.142" cidr_netmask="30"

          <br>

          nic="eth1.50" \

          <br>

          op monitor interval="1s"

          <br>

          primitive FailoverIP2 ocf:heartbeat:IPaddr2 \

          <br>

          params iflabel="FoIP2" ip="172.30.0.1" cidr_netmask="16"

          <br>

          nic="eth1.554" \

          <br>

          op monitor interval="1s"

          <br>

          primitive FailoverIP3 ocf:heartbeat:IPaddr2 \

          <br>

          params iflabel="FoIP3" ip="10.18.1.1" cidr_netmask="24"

          <br>

          nic="eth1.552" \

          <br>

          op monitor interval="1s"

          <br>

          primitive fs lsb:FSSofia \

          <br>

          op monitor interval="1s" enabled="false" timeout="2s"

          <br>

          on-fail="standby" \

          <br>

          meta target-role="Started"

          <br>

          group HAServices FailoverIP1 FailoverIP2 FailoverIP3 \

          <br>

          meta target-role="Started"

          <br>

          order FS-after-IP inf: HAServices fs

          <br>

          property $id="cib-bootstrap-options" \

          <br>

          dc-version="1.0.12-unknown" \

          <br>

          cluster-infrastructure="Heartbeat" \

          <br>

          stonith-enabled="false" \

          <br>

          expected-quorum-votes="1" \

          <br>

          no-quorum-policy="ignore" \

          <br>

          last-lrm-refresh="1299964019"

          <br>

          rsc_defaults $id="rsc-options" \

          <br>

          resource-stickiness="100"

          <br>

          <br>

          When 1-st node was crashed, then 2-nd node become active.

          During this

          <br>

          process in ha-debug file I found lines:

          <br>

          <br>

          ...

          <br>

          Jul 06 17:16:42 freeswitch1 crmd: [3385]: info:

          start_subsystem:

          <br>

          Starting sub-system "pengine"

          <br>

          Jul 06 17:16:42 freeswitch1 pengine: [3675]: info: Invoked:

          <br>

          /usr/lib64/heartbeat/pengine

          <br>

          Jul 06 17:16:42 freeswitch1 pengine: [3675]: info: main:

          Starting

          <br>

          pengine

          <br>

          Jul 06 17:16:46 freeswitch1 crmd: [3385]: info:

          do_dc_takeover:

          <br>

          Taking over DC status for this partition

          <br>

          Jul 06 17:16:46 freeswitch1 cib: [3381]: info:

          cib_process_readwrite:

          <br>

          We are now in R/W mode

          <br>

          Jul 06 17:16:46 freeswitch1 cib: [3381]: info:

          cib_process_request:

          <br>

          Operation complete: op cib_master for section 'all'

          <br>

          (origin=local/crmd/11, version=0.391.20): ok (

          <br>

          rc=0)

          <br>

          Jul 06 17:16:46 freeswitch1 cib: [3381]: info:

          cib_process_request:

          <br>

          Operation complete: op cib_modify for section cib

          <br>

          (origin=local/crmd/12, version=0.391.20): ok (rc

          <br>

          =0)

          <br>

          Jul 06 17:16:46 freeswitch1 cib: [3381]: info:

          cib_process_request:

          <br>

          Operation complete: op cib_modify for section crm_config

          <br>

          (origin=local/crmd/14, version=0.391.20):

          <br>

          ok (rc=0)

          <br>

          ...

          <br>

          <br>

          After "Starting pengine", only thru 4 seconds occured next

          action.

          <br>

          What happens at this time? Is it possible to reduce this time?

          <br>

        </blockquote>

        I seem to remember seeing something related to this in the code

        at one point.  I believe it is limited only to the use of

        heartbeat as the messaging layer.  After starting the pengine,

        the crmd sleeps waiting for the pengine to start before

        contacting it.  The sleep is just a guess at how long it will

        take before the pengine will be up and ready to accept a

        connection though.  That's why it is so long... so the gap will

        hopefully be large enough that no one will ever run into any

        problems with it (I am not a big fan of this type of logic at

        all)  I'd recommend moving to corosync and seeing if this delay

        goes away.

        <br>

        <br>

        -- Vossel

        <br>

        <br>

        <blockquote type="cite">Thanks in advance.

          <br>

          --

          <br>

          Best regards,

          <br>

          Vitaly

          <br>

          _______________________________________________

          <br>

          Pacemaker mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>

          <br>

          <a class="moz-txt-link-freetext" href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a>

          <br>

          <br>

          Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>

          <br>

          Getting started:

          <br>

          <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>

          <br>

          Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>

          <br>

          <br>

        </blockquote>

        _______________________________________________

        <br>

        Pacemaker mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>

        <br>

        <a class="moz-txt-link-freetext" href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a>

        <br>

        <br>

        Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>

        <br>

        Getting started:

        <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>

        <br>

        Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Best regards,

Vitaly

</pre>

  </body>

</html>