<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    It turns out that if I wait, the node that has resources already
    started when a quorum is lost does stop its resources after 15
    minutes. I repeated the test, and saw the same 15-minute delay.<br>
    <br>
    cluster-recheck-interval is set to 15 minutes by default, so I
    dropped it to 1 minute with:<br>
    <br>
    pcs property set cluster-recheck-interval="60"<br>
    <br>
    This successfully reduced the delay to 1 minute.<br>
    <br>
    Is it normal for Pacemaker to wait for cluster-recheck-interval
    before shutting down resources that were already running at the time
    quorum was lost?<br>
    <br>
    Thanks,<br>
    <br>
    Matt<br>
    <br>
    <div class="moz-cite-prefix">On 5/28/15 11:39 AM, Matt Rideout
      wrote:<br>
    </div>
    <blockquote cite="mid:556736AF.2070308@windserve.com" type="cite">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      I'm attempting to upgrade a two node cluster with no quorum
      requirement to a three node cluster with a two member quorum
      requirement. Each node is running CentOS 7, Pacemaker 1.1.12-22
      and Crosync 2.3.4-4.<br>
      <br>
      If a node that's running resources loses quorum, then I want it to
      stop all of its resources.  The goal was partially accomplished by
      setting the following in corosync.conf:<br>
      <br>
      quorum {<br>
        provider: corosync_votequorum<br>
        two_node: 1<br>
      }<br>
      <br>
      ...and updating Pacemaker's configuration with:<br>
      <br>
      pcs property set no-quorum-policy=stop<br>
      <br>
      With the above configuration, Two failure scenarios work as I
      would expect:<br>
      <br>
      1. If I power up a single node, it sees that there is no quorum,
      and refuses to start any resources until it sees a second node
      come up.<br>
      <br>
      2. If there are two nodes running, and I power down a node that's
      running resources, the other node sees that it lost quorum, and
      refuses to start any resources.<br>
      <br>
      However, a third failure scenario does not work as I would expect:<br>
      <br>
      3. If there are two nodes running, and I power down a node that's
      not running resources, the node that is running resources notes in
      its log that it lost quorum, but does not actually shutdown any of
      its running services.<br>
      <br>
      Any ideas on what the problem may be would be greatly appreciated.
      It in case it helps, I included the output of "pcs status", "pcs
      config show", the contents of "corosync.conf", and the pacemaker
      and corosync logs from the period during which resources were not
      stopped.<br>
      <br>
      <b>"pcs status" shows the resources still running after quorum is
        lost:</b><br>
      <br>
      Cluster name:<br>
      Last updated: Thu May 28 10:27:47 2015<br>
      Last change: Thu May 28 10:03:05 2015<br>
      Stack: corosync<br>
      Current DC: node1 (1) - partition WITHOUT quorum<br>
      Version: 1.1.12-a14efad<br>
      3 Nodes configured<br>
      12 Resources configured<br>
      <br>
      <br>
      Node node3 (3): OFFLINE (standby)<br>
      Online: [ node1 ]<br>
      OFFLINE: [ node2 ]<br>
      <br>
      Full list of resources:<br>
      <br>
       Resource Group: primary<br>
           virtual_ip_primary    (ocf::heartbeat:IPaddr2):    Started
      node1<br>
           GreenArrowFS    (ocf::heartbeat:Filesystem):    Started node1<br>
           GreenArrow    (ocf::drh:greenarrow):    Started node1<br>
           virtual_ip_1    (ocf::heartbeat:IPaddr2):    Started node1<br>
           virtual_ip_2    (ocf::heartbeat:IPaddr2):    Started node1<br>
       Resource Group: secondary<br>
           virtual_ip_secondary    (ocf::heartbeat:IPaddr2):    Stopped<br>
           GreenArrow-Secondary    (ocf::drh:greenarrow-secondary):   
      Stopped<br>
       Clone Set: ping-clone [ping]<br>
           Started: [ node1 ]<br>
           Stopped: [ node2 node3 ]<br>
       Master/Slave Set: GreenArrowDataClone [GreenArrowData]<br>
           Masters: [ node1 ]<br>
           Stopped: [ node2 node3 ]<br>
      <br>
      PCSD Status:<br>
        node1: Online<br>
        node2: Offline<br>
        node3: Offline<br>
      <br>
      Daemon Status:<br>
        corosync: active/enabled<br>
        pacemaker: active/enabled<br>
        pcsd: active/enabled<br>
      <br>
      <b>"pcs config show"</b><b> shows that the "no-quorum-policy:
        stop" setting is in place:</b><br>
      <br>
      Cluster Name:<br>
      Corosync Nodes:<br>
       node1 node2 node3<br>
      Pacemaker Nodes:<br>
       node1 node2 node3<br>
      <br>
      Resources:<br>
       Group: primary<br>
        Resource: virtual_ip_primary (class=ocf provider=heartbeat
      type=IPaddr2)<br>
         Attributes: ip=10.10.10.1 cidr_netmask=32<br>
         Operations: start interval=0s timeout=20s
      (virtual_ip_primary-start-timeout-20s)<br>
                     stop interval=0s timeout=20s
      (virtual_ip_primary-stop-timeout-20s)<br>
                     monitor interval=30s
      (virtual_ip_primary-monitor-interval-30s)<br>
        Resource: GreenArrowFS (class=ocf provider=heartbeat
      type=Filesystem)<br>
         Attributes: device=/dev/drbd1 directory=/media/drbd1 fstype=xfs
      options=noatime,discard<br>
         Operations: start interval=0s timeout=60
      (GreenArrowFS-start-timeout-60)<br>
                     stop interval=0s timeout=60
      (GreenArrowFS-stop-timeout-60)<br>
                     monitor interval=20 timeout=40
      (GreenArrowFS-monitor-interval-20)<br>
        Resource: GreenArrow (class=ocf provider=drh type=greenarrow)<br>
         Operations: start interval=0s timeout=30
      (GreenArrow-start-timeout-30)<br>
                     stop interval=0s timeout=240
      (GreenArrow-stop-timeout-240)<br>
                     monitor interval=10 timeout=20
      (GreenArrow-monitor-interval-10)<br>
        Resource: virtual_ip_1 (class=ocf provider=heartbeat
      type=IPaddr2)<br>
         Attributes: ip=64.21.76.51 cidr_netmask=32<br>
         Operations: start interval=0s timeout=20s
      (virtual_ip_1-start-timeout-20s)<br>
                     stop interval=0s timeout=20s
      (virtual_ip_1-stop-timeout-20s)<br>
                     monitor interval=30s
      (virtual_ip_1-monitor-interval-30s)<br>
        Resource: virtual_ip_2 (class=ocf provider=heartbeat
      type=IPaddr2)<br>
         Attributes: ip=64.21.76.63 cidr_netmask=32<br>
         Operations: start interval=0s timeout=20s
      (virtual_ip_2-start-timeout-20s)<br>
                     stop interval=0s timeout=20s
      (virtual_ip_2-stop-timeout-20s)<br>
                     monitor interval=30s
      (virtual_ip_2-monitor-interval-30s)<br>
       Group: secondary<br>
        Resource: virtual_ip_secondary (class=ocf provider=heartbeat
      type=IPaddr2)<br>
         Attributes: ip=10.10.10.4 cidr_netmask=32<br>
         Operations: start interval=0s timeout=20s
      (virtual_ip_secondary-start-timeout-20s)<br>
                     stop interval=0s timeout=20s
      (virtual_ip_secondary-stop-timeout-20s)<br>
                     monitor interval=30s
      (virtual_ip_secondary-monitor-interval-30s)<br>
        Resource: GreenArrow-Secondary (class=ocf provider=drh
      type=greenarrow-secondary)<br>
         Operations: start interval=0s timeout=30
      (GreenArrow-Secondary-start-timeout-30)<br>
                     stop interval=0s timeout=240
      (GreenArrow-Secondary-stop-timeout-240)<br>
                     monitor interval=10 timeout=20
      (GreenArrow-Secondary-monitor-interval-10)<br>
       Clone: ping-clone<br>
        Resource: ping (class=ocf provider=pacemaker type=ping)<br>
         Attributes: dampen=30s multiplier=1000 host_list=64.21.76.1<br>
         Operations: start interval=0s timeout=60
      (ping-start-timeout-60)<br>
                     stop interval=0s timeout=20 (ping-stop-timeout-20)<br>
                     monitor interval=10 timeout=60
      (ping-monitor-interval-10)<br>
       Master: GreenArrowDataClone<br>
        Meta Attrs: master-max=1 master-node-max=1 clone-max=2
      clone-node-max=1 notify=true<br>
        Resource: GreenArrowData (class=ocf provider=linbit type=drbd)<br>
         Attributes: drbd_resource=r0<br>
         Operations: start interval=0s timeout=240
      (GreenArrowData-start-timeout-240)<br>
                     promote interval=0s timeout=90
      (GreenArrowData-promote-timeout-90)<br>
                     demote interval=0s timeout=90
      (GreenArrowData-demote-timeout-90)<br>
                     stop interval=0s timeout=100
      (GreenArrowData-stop-timeout-100)<br>
                     monitor interval=60s
      (GreenArrowData-monitor-interval-60s)<br>
      <br>
      Stonith Devices:<br>
      Fencing Levels:<br>
      <br>
      Location Constraints:<br>
        Resource: primary<br>
          Enabled on: node1 (score:INFINITY)
      (id:location-primary-node1-INFINITY)<br>
          Constraint: location-primary<br>
            Rule: score=-INFINITY boolean-op=or 
      (id:location-primary-rule)<br>
              Expression: pingd lt 1  (id:location-primary-rule-expr)<br>
              Expression: not_defined pingd 
      (id:location-primary-rule-expr-1)<br>
      Ordering Constraints:<br>
        promote GreenArrowDataClone then start GreenArrowFS
      (kind:Mandatory)
      (id:order-GreenArrowDataClone-GreenArrowFS-mandatory)<br>
        stop GreenArrowFS then demote GreenArrowDataClone
      (kind:Mandatory)
      (id:order-GreenArrowFS-GreenArrowDataClone-mandatory)<br>
      Colocation Constraints:<br>
        GreenArrowFS with GreenArrowDataClone (score:INFINITY)
      (with-rsc-role:Master)
      (id:colocation-GreenArrowFS-GreenArrowDataClone-INFINITY)<br>
        virtual_ip_secondary with GreenArrowDataClone (score:INFINITY)
      (with-rsc-role:Slave)
      (id:colocation-virtual_ip_secondary-GreenArrowDataClone-INFINITY)<br>
        virtual_ip_primary with GreenArrowDataClone (score:INFINITY)
      (with-rsc-role:Master)
      (id:colocation-virtual_ip_primary-GreenArrowDataClone-INFINITY)<br>
      <br>
      Cluster Properties:<br>
       cluster-infrastructure: corosync<br>
       cluster-name: cluster_greenarrow<br>
       dc-version: 1.1.12-a14efad<br>
       have-watchdog: false<br>
       no-quorum-policy: stop<br>
       stonith-enabled: false<br>
      Node Attributes:<br>
       node3: standby=on<br>
      <br>
      <b>Here's what was logged</b>:<br>
      <br>
      May 28 10:19:51 node1 pengine[1296]: notice: stage6: Scheduling
      Node node3 for shutdown<br>
      May 28 10:19:51 node1 pengine[1296]: notice: process_pe_message:
      Calculated Transition 7:
      /var/lib/pacemaker/pengine/pe-input-992.bz2<br>
      May 28 10:19:51 node1 crmd[1297]: notice: run_graph: Transition 7
      (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
      Source=/var/lib/pacemaker/pengine/pe-input-992.bz2): Complete<br>
      May 28 10:19:51 node1 crmd[1297]: notice: do_state_transition:
      State transition S_TRANSITION_ENGINE -> S_IDLE [
      input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]<br>
      May 28 10:19:51 node1 crmd[1297]: notice: peer_update_callback:
      do_shutdown of node3 (op 64) is complete<br>
      May 28 10:19:51 node1 attrd[1295]: notice: crm_update_peer_state:
      attrd_peer_change_cb: Node node3[3] - state is now lost (was
      member)<br>
      May 28 10:19:51 node1 attrd[1295]: notice: attrd_peer_remove:
      Removing all node3 attributes for attrd_peer_change_cb<br>
      May 28 10:19:51 node1 attrd[1295]: notice: attrd_peer_change_cb:
      Lost attribute writer node3<br>
      May 28 10:19:51 node1 corosync[1040]: [TOTEM ] Membership left
      list contains incorrect address. This is sign of misconfiguration
      between nodes!<br>
      May 28 10:19:51 node1 corosync[1040]: [TOTEM ] A new membership
      (64.21.76.61:25740) was formed. Members left: 3<br>
      May 28 10:19:51 node1 corosync[1040]: [QUORUM] This node is within
      the non-primary component and will NOT provide any services.<br>
      May 28 10:19:51 node1 corosync[1040]: [QUORUM] Members[1]: 1<br>
      May 28 10:19:51 node1 corosync[1040]: [MAIN  ] Completed service
      synchronization, ready to provide service.<br>
      May 28 10:19:51 node1 crmd[1297]: notice:
      pcmk_quorum_notification: Membership 25740: quorum lost (1)<br>
      May 28 10:19:51 node1 crmd[1297]: notice: crm_update_peer_state:
      pcmk_quorum_notification: Node node3[3] - state is now lost (was
      member)<br>
      May 28 10:19:51 node1 crmd[1297]: notice: peer_update_callback:
      do_shutdown of node3 (op 64) is complete<br>
      May 28 10:19:51 node1 pacemakerd[1254]: notice:
      pcmk_quorum_notification: Membership 25740: quorum lost (1)<br>
      May 28 10:19:51 node1 pacemakerd[1254]: notice:
      crm_update_peer_state: pcmk_quorum_notification: Node node3[3] -
      state is now lost (was member)<br>
      May 28 10:19:52 node1 corosync[1040]: [TOTEM ] Automatically
      recovered ring 1<br>
      <br>
      <b>H</b><b>ere's corosync.conf:</b><br>
      <br>
      totem {<br>
        version: 2<br>
        secauth: off<br>
        cluster_name: cluster_greenarrow<br>
        rrp_mode: passive<br>
        transport: udpu<br>
      }<br>
      <br>
      nodelist {<br>
        node {<br>
          ring0_addr: node1<br>
          ring1_addr: 10.10.10.2<br>
          nodeid: 1<br>
        }<br>
        node {<br>
          ring0_addr: node2<br>
          ring1_addr: 10.10.10.3<br>
          nodeid: 2<br>
        }<br>
        node {<br>
          ring0_addr: node3<br>
          nodeid: 3<br>
        }<br>
      }<br>
      <br>
      quorum {<br>
        provider: corosync_votequorum<br>
        two_node: 0<br>
      }<br>
      <br>
      logging {<br>
        to_syslog: yes<br>
      }<br>
      <br>
      Thanks,<br>
      <br>
      Matt<br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>
<a class="moz-txt-link-freetext" href="http://clusterlabs.org/mailman/listinfo/users">http://clusterlabs.org/mailman/listinfo/users</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>
Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>
Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>