<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    It turns out that if I wait, the node that has resources already

    started when a quorum is lost does stop its resources after 15

    minutes. I repeated the test, and saw the same 15-minute delay.<br>

    <br>

    cluster-recheck-interval is set to 15 minutes by default, so I

    dropped it to 1 minute with:<br>

    <br>

    pcs property set cluster-recheck-interval="60"<br>

    <br>

    This successfully reduced the delay to 1 minute.<br>

    <br>

    Is it normal for Pacemaker to wait for cluster-recheck-interval

    before shutting down resources that were already running at the time

    quorum was lost?<br>

    <br>

    Thanks,<br>

    <br>

    Matt<br>

    <br>

    <div class="moz-cite-prefix">On 5/28/15 11:39 AM, Matt Rideout

      wrote:<br>

    </div>

    <blockquote cite="mid:556736AF.2070308@windserve.com" type="cite">

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      I'm attempting to upgrade a two node cluster with no quorum

      requirement to a three node cluster with a two member quorum

      requirement. Each node is running CentOS 7, Pacemaker 1.1.12-22

      and Crosync 2.3.4-4.<br>

      <br>

      If a node that's running resources loses quorum, then I want it to

      stop all of its resources.  The goal was partially accomplished by

      setting the following in corosync.conf:<br>

      <br>

      quorum {<br>

        provider: corosync_votequorum<br>

        two_node: 1<br>

      }<br>

      <br>

      ...and updating Pacemaker's configuration with:<br>

      <br>

      pcs property set no-quorum-policy=stop<br>

      <br>

      With the above configuration, Two failure scenarios work as I

      would expect:<br>

      <br>

      1. If I power up a single node, it sees that there is no quorum,

      and refuses to start any resources until it sees a second node

      come up.<br>

      <br>

      2. If there are two nodes running, and I power down a node that's

      running resources, the other node sees that it lost quorum, and

      refuses to start any resources.<br>

      <br>

      However, a third failure scenario does not work as I would expect:<br>

      <br>

      3. If there are two nodes running, and I power down a node that's

      not running resources, the node that is running resources notes in

      its log that it lost quorum, but does not actually shutdown any of

      its running services.<br>

      <br>

      Any ideas on what the problem may be would be greatly appreciated.

      It in case it helps, I included the output of "pcs status", "pcs

      config show", the contents of "corosync.conf", and the pacemaker

      and corosync logs from the period during which resources were not

      stopped.<br>

      <br>

      <b>"pcs status" shows the resources still running after quorum is

        lost:</b><br>

      <br>

      Cluster name:<br>

      Last updated: Thu May 28 10:27:47 2015<br>

      Last change: Thu May 28 10:03:05 2015<br>

      Stack: corosync<br>

      Current DC: node1 (1) - partition WITHOUT quorum<br>

      Version: 1.1.12-a14efad<br>

      3 Nodes configured<br>

      12 Resources configured<br>

      <br>

      <br>

      Node node3 (3): OFFLINE (standby)<br>

      Online: [ node1 ]<br>

      OFFLINE: [ node2 ]<br>

      <br>

      Full list of resources:<br>

      <br>

       Resource Group: primary<br>

           virtual_ip_primary    (ocf::heartbeat:IPaddr2):    Started

      node1<br>

           GreenArrowFS    (ocf::heartbeat:Filesystem):    Started node1<br>

           GreenArrow    (ocf::drh:greenarrow):    Started node1<br>

           virtual_ip_1    (ocf::heartbeat:IPaddr2):    Started node1<br>

           virtual_ip_2    (ocf::heartbeat:IPaddr2):    Started node1<br>

       Resource Group: secondary<br>

           virtual_ip_secondary    (ocf::heartbeat:IPaddr2):    Stopped<br>

           GreenArrow-Secondary    (ocf::drh:greenarrow-secondary):   

      Stopped<br>

       Clone Set: ping-clone [ping]<br>

           Started: [ node1 ]<br>

           Stopped: [ node2 node3 ]<br>

       Master/Slave Set: GreenArrowDataClone [GreenArrowData]<br>

           Masters: [ node1 ]<br>

           Stopped: [ node2 node3 ]<br>

      <br>

      PCSD Status:<br>

        node1: Online<br>

        node2: Offline<br>

        node3: Offline<br>

      <br>

      Daemon Status:<br>

        corosync: active/enabled<br>

        pacemaker: active/enabled<br>

        pcsd: active/enabled<br>

      <br>

      <b>"pcs config show"</b><b> shows that the "no-quorum-policy:

        stop" setting is in place:</b><br>

      <br>

      Cluster Name:<br>

      Corosync Nodes:<br>

       node1 node2 node3<br>

      Pacemaker Nodes:<br>

       node1 node2 node3<br>

      <br>

      Resources:<br>

       Group: primary<br>

        Resource: virtual_ip_primary (class=ocf provider=heartbeat

      type=IPaddr2)<br>

         Attributes: ip=10.10.10.1 cidr_netmask=32<br>

         Operations: start interval=0s timeout=20s

      (virtual_ip_primary-start-timeout-20s)<br>

                     stop interval=0s timeout=20s

      (virtual_ip_primary-stop-timeout-20s)<br>

                     monitor interval=30s

      (virtual_ip_primary-monitor-interval-30s)<br>

        Resource: GreenArrowFS (class=ocf provider=heartbeat

      type=Filesystem)<br>

         Attributes: device=/dev/drbd1 directory=/media/drbd1 fstype=xfs

      options=noatime,discard<br>

         Operations: start interval=0s timeout=60

      (GreenArrowFS-start-timeout-60)<br>

                     stop interval=0s timeout=60

      (GreenArrowFS-stop-timeout-60)<br>

                     monitor interval=20 timeout=40

      (GreenArrowFS-monitor-interval-20)<br>

        Resource: GreenArrow (class=ocf provider=drh type=greenarrow)<br>

         Operations: start interval=0s timeout=30

      (GreenArrow-start-timeout-30)<br>

                     stop interval=0s timeout=240

      (GreenArrow-stop-timeout-240)<br>

                     monitor interval=10 timeout=20

      (GreenArrow-monitor-interval-10)<br>

        Resource: virtual_ip_1 (class=ocf provider=heartbeat

      type=IPaddr2)<br>

         Attributes: ip=64.21.76.51 cidr_netmask=32<br>

         Operations: start interval=0s timeout=20s

      (virtual_ip_1-start-timeout-20s)<br>

                     stop interval=0s timeout=20s

      (virtual_ip_1-stop-timeout-20s)<br>

                     monitor interval=30s

      (virtual_ip_1-monitor-interval-30s)<br>

        Resource: virtual_ip_2 (class=ocf provider=heartbeat

      type=IPaddr2)<br>

         Attributes: ip=64.21.76.63 cidr_netmask=32<br>

         Operations: start interval=0s timeout=20s

      (virtual_ip_2-start-timeout-20s)<br>

                     stop interval=0s timeout=20s

      (virtual_ip_2-stop-timeout-20s)<br>

                     monitor interval=30s

      (virtual_ip_2-monitor-interval-30s)<br>

       Group: secondary<br>

        Resource: virtual_ip_secondary (class=ocf provider=heartbeat

      type=IPaddr2)<br>

         Attributes: ip=10.10.10.4 cidr_netmask=32<br>

         Operations: start interval=0s timeout=20s

      (virtual_ip_secondary-start-timeout-20s)<br>

                     stop interval=0s timeout=20s

      (virtual_ip_secondary-stop-timeout-20s)<br>

                     monitor interval=30s

      (virtual_ip_secondary-monitor-interval-30s)<br>

        Resource: GreenArrow-Secondary (class=ocf provider=drh

      type=greenarrow-secondary)<br>

         Operations: start interval=0s timeout=30

      (GreenArrow-Secondary-start-timeout-30)<br>

                     stop interval=0s timeout=240

      (GreenArrow-Secondary-stop-timeout-240)<br>

                     monitor interval=10 timeout=20

      (GreenArrow-Secondary-monitor-interval-10)<br>

       Clone: ping-clone<br>

        Resource: ping (class=ocf provider=pacemaker type=ping)<br>

         Attributes: dampen=30s multiplier=1000 host_list=64.21.76.1<br>

         Operations: start interval=0s timeout=60

      (ping-start-timeout-60)<br>

                     stop interval=0s timeout=20 (ping-stop-timeout-20)<br>

                     monitor interval=10 timeout=60

      (ping-monitor-interval-10)<br>

       Master: GreenArrowDataClone<br>

        Meta Attrs: master-max=1 master-node-max=1 clone-max=2

      clone-node-max=1 notify=true<br>

        Resource: GreenArrowData (class=ocf provider=linbit type=drbd)<br>

         Attributes: drbd_resource=r0<br>

         Operations: start interval=0s timeout=240

      (GreenArrowData-start-timeout-240)<br>

                     promote interval=0s timeout=90

      (GreenArrowData-promote-timeout-90)<br>

                     demote interval=0s timeout=90

      (GreenArrowData-demote-timeout-90)<br>

                     stop interval=0s timeout=100

      (GreenArrowData-stop-timeout-100)<br>

                     monitor interval=60s

      (GreenArrowData-monitor-interval-60s)<br>

      <br>

      Stonith Devices:<br>

      Fencing Levels:<br>

      <br>

      Location Constraints:<br>

        Resource: primary<br>

          Enabled on: node1 (score:INFINITY)

      (id:location-primary-node1-INFINITY)<br>

          Constraint: location-primary<br>

            Rule: score=-INFINITY boolean-op=or 

      (id:location-primary-rule)<br>

              Expression: pingd lt 1  (id:location-primary-rule-expr)<br>

              Expression: not_defined pingd 

      (id:location-primary-rule-expr-1)<br>

      Ordering Constraints:<br>

        promote GreenArrowDataClone then start GreenArrowFS

      (kind:Mandatory)

      (id:order-GreenArrowDataClone-GreenArrowFS-mandatory)<br>

        stop GreenArrowFS then demote GreenArrowDataClone

      (kind:Mandatory)

      (id:order-GreenArrowFS-GreenArrowDataClone-mandatory)<br>

      Colocation Constraints:<br>

        GreenArrowFS with GreenArrowDataClone (score:INFINITY)

      (with-rsc-role:Master)

      (id:colocation-GreenArrowFS-GreenArrowDataClone-INFINITY)<br>

        virtual_ip_secondary with GreenArrowDataClone (score:INFINITY)

      (with-rsc-role:Slave)

      (id:colocation-virtual_ip_secondary-GreenArrowDataClone-INFINITY)<br>

        virtual_ip_primary with GreenArrowDataClone (score:INFINITY)

      (with-rsc-role:Master)

      (id:colocation-virtual_ip_primary-GreenArrowDataClone-INFINITY)<br>

      <br>

      Cluster Properties:<br>

       cluster-infrastructure: corosync<br>

       cluster-name: cluster_greenarrow<br>

       dc-version: 1.1.12-a14efad<br>

       have-watchdog: false<br>

       no-quorum-policy: stop<br>

       stonith-enabled: false<br>

      Node Attributes:<br>

       node3: standby=on<br>

      <br>

      <b>Here's what was logged</b>:<br>

      <br>

      May 28 10:19:51 node1 pengine[1296]: notice: stage6: Scheduling

      Node node3 for shutdown<br>

      May 28 10:19:51 node1 pengine[1296]: notice: process_pe_message:

      Calculated Transition 7:

      /var/lib/pacemaker/pengine/pe-input-992.bz2<br>

      May 28 10:19:51 node1 crmd[1297]: notice: run_graph: Transition 7

      (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,

      Source=/var/lib/pacemaker/pengine/pe-input-992.bz2): Complete<br>

      May 28 10:19:51 node1 crmd[1297]: notice: do_state_transition:

      State transition S_TRANSITION_ENGINE -> S_IDLE [

      input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br>

      May 28 10:19:51 node1 crmd[1297]: notice: peer_update_callback:

      do_shutdown of node3 (op 64) is complete<br>

      May 28 10:19:51 node1 attrd[1295]: notice: crm_update_peer_state:

      attrd_peer_change_cb: Node node3[3] - state is now lost (was

      member)<br>

      May 28 10:19:51 node1 attrd[1295]: notice: attrd_peer_remove:

      Removing all node3 attributes for attrd_peer_change_cb<br>

      May 28 10:19:51 node1 attrd[1295]: notice: attrd_peer_change_cb:

      Lost attribute writer node3<br>

      May 28 10:19:51 node1 corosync[1040]: [TOTEM ] Membership left

      list contains incorrect address. This is sign of misconfiguration

      between nodes!<br>

      May 28 10:19:51 node1 corosync[1040]: [TOTEM ] A new membership

      (64.21.76.61:25740) was formed. Members left: 3<br>

      May 28 10:19:51 node1 corosync[1040]: [QUORUM] This node is within

      the non-primary component and will NOT provide any services.<br>

      May 28 10:19:51 node1 corosync[1040]: [QUORUM] Members[1]: 1<br>

      May 28 10:19:51 node1 corosync[1040]: [MAIN  ] Completed service

      synchronization, ready to provide service.<br>

      May 28 10:19:51 node1 crmd[1297]: notice:

      pcmk_quorum_notification: Membership 25740: quorum lost (1)<br>

      May 28 10:19:51 node1 crmd[1297]: notice: crm_update_peer_state:

      pcmk_quorum_notification: Node node3[3] - state is now lost (was

      member)<br>

      May 28 10:19:51 node1 crmd[1297]: notice: peer_update_callback:

      do_shutdown of node3 (op 64) is complete<br>

      May 28 10:19:51 node1 pacemakerd[1254]: notice:

      pcmk_quorum_notification: Membership 25740: quorum lost (1)<br>

      May 28 10:19:51 node1 pacemakerd[1254]: notice:

      crm_update_peer_state: pcmk_quorum_notification: Node node3[3] -

      state is now lost (was member)<br>

      May 28 10:19:52 node1 corosync[1040]: [TOTEM ] Automatically

      recovered ring 1<br>

      <br>

      <b>H</b><b>ere's corosync.conf:</b><br>

      <br>

      totem {<br>

        version: 2<br>

        secauth: off<br>

        cluster_name: cluster_greenarrow<br>

        rrp_mode: passive<br>

        transport: udpu<br>

      }<br>

      <br>

      nodelist {<br>

        node {<br>

          ring0_addr: node1<br>

          ring1_addr: 10.10.10.2<br>

          nodeid: 1<br>

        }<br>

        node {<br>

          ring0_addr: node2<br>

          ring1_addr: 10.10.10.3<br>

          nodeid: 2<br>

        }<br>

        node {<br>

          ring0_addr: node3<br>

          nodeid: 3<br>

        }<br>

      }<br>

      <br>

      quorum {<br>

        provider: corosync_votequorum<br>

        two_node: 0<br>

      }<br>

      <br>

      logging {<br>

        to_syslog: yes<br>

      }<br>

      <br>

      Thanks,<br>

      <br>

      Matt<br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>

<a class="moz-txt-link-freetext" href="http://clusterlabs.org/mailman/listinfo/users">http://clusterlabs.org/mailman/listinfo/users</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>

Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>

Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>