<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    I'm attempting to upgrade a two node cluster with no quorum

    requirement to a three node cluster with a two member quorum

    requirement. Each node is running CentOS 7, Pacemaker 1.1.12-22 and

    Crosync 2.3.4-4.<br>

    <br>

    If a node that's running resources loses quorum, then I want it to

    stop all of its resources.  The goal was partially accomplished by

    setting the following in corosync.conf:<br>

    <br>

    quorum {<br>

      provider: corosync_votequorum<br>

      two_node: 1<br>

    }<br>

    <br>

    ...and updating Pacemaker's configuration with:<br>

    <br>

    pcs property set no-quorum-policy=stop<br>

    <br>

    With the above configuration, Two failure scenarios work as I would

    expect:<br>

    <br>

    1. If I power up a single node, it sees that there is no quorum, and

    refuses to start any resources until it sees a second node come up.<br>

    <br>

    2. If there are two nodes running, and I power down a node that's

    running resources, the other node sees that it lost quorum, and

    refuses to start any resources.<br>

    <br>

    However, a third failure scenario does not work as I would expect:<br>

    <br>

    3. If there are two nodes running, and I power down a node that's

    not running resources, the node that is running resources notes in

    its log that it lost quorum, but does not actually shutdown any of

    its running services.<br>

    <br>

    Any ideas on what the problem may be would be greatly appreciated.

    It in case it helps, I included the output of "pcs status", "pcs

    config show", the contents of "corosync.conf", and the pacemaker and

    corosync logs from the period during which resources were not

    stopped.<br>

    <br>

    <b>"pcs status" shows the resources still running after quorum is

      lost:</b><br>

    <br>

    Cluster name:<br>

    Last updated: Thu May 28 10:27:47 2015<br>

    Last change: Thu May 28 10:03:05 2015<br>

    Stack: corosync<br>

    Current DC: node1 (1) - partition WITHOUT quorum<br>

    Version: 1.1.12-a14efad<br>

    3 Nodes configured<br>

    12 Resources configured<br>

    <br>

    <br>

    Node node3 (3): OFFLINE (standby)<br>

    Online: [ node1 ]<br>

    OFFLINE: [ node2 ]<br>

    <br>

    Full list of resources:<br>

    <br>

     Resource Group: primary<br>

         virtual_ip_primary    (ocf::heartbeat:IPaddr2):    Started

    node1<br>

         GreenArrowFS    (ocf::heartbeat:Filesystem):    Started node1<br>

         GreenArrow    (ocf::drh:greenarrow):    Started node1<br>

         virtual_ip_1    (ocf::heartbeat:IPaddr2):    Started node1<br>

         virtual_ip_2    (ocf::heartbeat:IPaddr2):    Started node1<br>

     Resource Group: secondary<br>

         virtual_ip_secondary    (ocf::heartbeat:IPaddr2):    Stopped<br>

         GreenArrow-Secondary    (ocf::drh:greenarrow-secondary):   

    Stopped<br>

     Clone Set: ping-clone [ping]<br>

         Started: [ node1 ]<br>

         Stopped: [ node2 node3 ]<br>

     Master/Slave Set: GreenArrowDataClone [GreenArrowData]<br>

         Masters: [ node1 ]<br>

         Stopped: [ node2 node3 ]<br>

    <br>

    PCSD Status:<br>

      node1: Online<br>

      node2: Offline<br>

      node3: Offline<br>

    <br>

    Daemon Status:<br>

      corosync: active/enabled<br>

      pacemaker: active/enabled<br>

      pcsd: active/enabled<br>

    <br>

    <b>"pcs config show"</b><b> shows that the "no-quorum-policy: stop"

      setting is in place:</b><br>

    <br>

    Cluster Name:<br>

    Corosync Nodes:<br>

     node1 node2 node3<br>

    Pacemaker Nodes:<br>

     node1 node2 node3<br>

    <br>

    Resources:<br>

     Group: primary<br>

      Resource: virtual_ip_primary (class=ocf provider=heartbeat

    type=IPaddr2)<br>

       Attributes: ip=10.10.10.1 cidr_netmask=32<br>

       Operations: start interval=0s timeout=20s

    (virtual_ip_primary-start-timeout-20s)<br>

                   stop interval=0s timeout=20s

    (virtual_ip_primary-stop-timeout-20s)<br>

                   monitor interval=30s

    (virtual_ip_primary-monitor-interval-30s)<br>

      Resource: GreenArrowFS (class=ocf provider=heartbeat

    type=Filesystem)<br>

       Attributes: device=/dev/drbd1 directory=/media/drbd1 fstype=xfs

    options=noatime,discard<br>

       Operations: start interval=0s timeout=60

    (GreenArrowFS-start-timeout-60)<br>

                   stop interval=0s timeout=60

    (GreenArrowFS-stop-timeout-60)<br>

                   monitor interval=20 timeout=40

    (GreenArrowFS-monitor-interval-20)<br>

      Resource: GreenArrow (class=ocf provider=drh type=greenarrow)<br>

       Operations: start interval=0s timeout=30

    (GreenArrow-start-timeout-30)<br>

                   stop interval=0s timeout=240

    (GreenArrow-stop-timeout-240)<br>

                   monitor interval=10 timeout=20

    (GreenArrow-monitor-interval-10)<br>

      Resource: virtual_ip_1 (class=ocf provider=heartbeat type=IPaddr2)<br>

       Attributes: ip=64.21.76.51 cidr_netmask=32<br>

       Operations: start interval=0s timeout=20s

    (virtual_ip_1-start-timeout-20s)<br>

                   stop interval=0s timeout=20s

    (virtual_ip_1-stop-timeout-20s)<br>

                   monitor interval=30s

    (virtual_ip_1-monitor-interval-30s)<br>

      Resource: virtual_ip_2 (class=ocf provider=heartbeat type=IPaddr2)<br>

       Attributes: ip=64.21.76.63 cidr_netmask=32<br>

       Operations: start interval=0s timeout=20s

    (virtual_ip_2-start-timeout-20s)<br>

                   stop interval=0s timeout=20s

    (virtual_ip_2-stop-timeout-20s)<br>

                   monitor interval=30s

    (virtual_ip_2-monitor-interval-30s)<br>

     Group: secondary<br>

      Resource: virtual_ip_secondary (class=ocf provider=heartbeat

    type=IPaddr2)<br>

       Attributes: ip=10.10.10.4 cidr_netmask=32<br>

       Operations: start interval=0s timeout=20s

    (virtual_ip_secondary-start-timeout-20s)<br>

                   stop interval=0s timeout=20s

    (virtual_ip_secondary-stop-timeout-20s)<br>

                   monitor interval=30s

    (virtual_ip_secondary-monitor-interval-30s)<br>

      Resource: GreenArrow-Secondary (class=ocf provider=drh

    type=greenarrow-secondary)<br>

       Operations: start interval=0s timeout=30

    (GreenArrow-Secondary-start-timeout-30)<br>

                   stop interval=0s timeout=240

    (GreenArrow-Secondary-stop-timeout-240)<br>

                   monitor interval=10 timeout=20

    (GreenArrow-Secondary-monitor-interval-10)<br>

     Clone: ping-clone<br>

      Resource: ping (class=ocf provider=pacemaker type=ping)<br>

       Attributes: dampen=30s multiplier=1000 host_list=64.21.76.1<br>

       Operations: start interval=0s timeout=60 (ping-start-timeout-60)<br>

                   stop interval=0s timeout=20 (ping-stop-timeout-20)<br>

                   monitor interval=10 timeout=60

    (ping-monitor-interval-10)<br>

     Master: GreenArrowDataClone<br>

      Meta Attrs: master-max=1 master-node-max=1 clone-max=2

    clone-node-max=1 notify=true<br>

      Resource: GreenArrowData (class=ocf provider=linbit type=drbd)<br>

       Attributes: drbd_resource=r0<br>

       Operations: start interval=0s timeout=240

    (GreenArrowData-start-timeout-240)<br>

                   promote interval=0s timeout=90

    (GreenArrowData-promote-timeout-90)<br>

                   demote interval=0s timeout=90

    (GreenArrowData-demote-timeout-90)<br>

                   stop interval=0s timeout=100

    (GreenArrowData-stop-timeout-100)<br>

                   monitor interval=60s

    (GreenArrowData-monitor-interval-60s)<br>

    <br>

    Stonith Devices:<br>

    Fencing Levels:<br>

    <br>

    Location Constraints:<br>

      Resource: primary<br>

        Enabled on: node1 (score:INFINITY)

    (id:location-primary-node1-INFINITY)<br>

        Constraint: location-primary<br>

          Rule: score=-INFINITY boolean-op=or 

    (id:location-primary-rule)<br>

            Expression: pingd lt 1  (id:location-primary-rule-expr)<br>

            Expression: not_defined pingd 

    (id:location-primary-rule-expr-1)<br>

    Ordering Constraints:<br>

      promote GreenArrowDataClone then start GreenArrowFS

    (kind:Mandatory)

    (id:order-GreenArrowDataClone-GreenArrowFS-mandatory)<br>

      stop GreenArrowFS then demote GreenArrowDataClone (kind:Mandatory)

    (id:order-GreenArrowFS-GreenArrowDataClone-mandatory)<br>

    Colocation Constraints:<br>

      GreenArrowFS with GreenArrowDataClone (score:INFINITY)

    (with-rsc-role:Master)

    (id:colocation-GreenArrowFS-GreenArrowDataClone-INFINITY)<br>

      virtual_ip_secondary with GreenArrowDataClone (score:INFINITY)

    (with-rsc-role:Slave)

    (id:colocation-virtual_ip_secondary-GreenArrowDataClone-INFINITY)<br>

      virtual_ip_primary with GreenArrowDataClone (score:INFINITY)

    (with-rsc-role:Master)

    (id:colocation-virtual_ip_primary-GreenArrowDataClone-INFINITY)<br>

    <br>

    Cluster Properties:<br>

     cluster-infrastructure: corosync<br>

     cluster-name: cluster_greenarrow<br>

     dc-version: 1.1.12-a14efad<br>

     have-watchdog: false<br>

     no-quorum-policy: stop<br>

     stonith-enabled: false<br>

    Node Attributes:<br>

     node3: standby=on<br>

    <br>

    <b>Here's what was logged</b>:<br>

    <br>

    May 28 10:19:51 node1 pengine[1296]: notice: stage6: Scheduling Node

    node3 for shutdown<br>

    May 28 10:19:51 node1 pengine[1296]: notice: process_pe_message:

    Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-992.bz2<br>

    May 28 10:19:51 node1 crmd[1297]: notice: run_graph: Transition 7

    (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,

    Source=/var/lib/pacemaker/pengine/pe-input-992.bz2): Complete<br>

    May 28 10:19:51 node1 crmd[1297]: notice: do_state_transition: State

    transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS

    cause=C_FSA_INTERNAL origin=notify_crmd ]<br>

    May 28 10:19:51 node1 crmd[1297]: notice: peer_update_callback:

    do_shutdown of node3 (op 64) is complete<br>

    May 28 10:19:51 node1 attrd[1295]: notice: crm_update_peer_state:

    attrd_peer_change_cb: Node node3[3] - state is now lost (was member)<br>

    May 28 10:19:51 node1 attrd[1295]: notice: attrd_peer_remove:

    Removing all node3 attributes for attrd_peer_change_cb<br>

    May 28 10:19:51 node1 attrd[1295]: notice: attrd_peer_change_cb:

    Lost attribute writer node3<br>

    May 28 10:19:51 node1 corosync[1040]: [TOTEM ] Membership left list

    contains incorrect address. This is sign of misconfiguration between

    nodes!<br>

    May 28 10:19:51 node1 corosync[1040]: [TOTEM ] A new membership

    (64.21.76.61:25740) was formed. Members left: 3<br>

    May 28 10:19:51 node1 corosync[1040]: [QUORUM] This node is within

    the non-primary component and will NOT provide any services.<br>

    May 28 10:19:51 node1 corosync[1040]: [QUORUM] Members[1]: 1<br>

    May 28 10:19:51 node1 corosync[1040]: [MAIN  ] Completed service

    synchronization, ready to provide service.<br>

    May 28 10:19:51 node1 crmd[1297]: notice: pcmk_quorum_notification:

    Membership 25740: quorum lost (1)<br>

    May 28 10:19:51 node1 crmd[1297]: notice: crm_update_peer_state:

    pcmk_quorum_notification: Node node3[3] - state is now lost (was

    member)<br>

    May 28 10:19:51 node1 crmd[1297]: notice: peer_update_callback:

    do_shutdown of node3 (op 64) is complete<br>

    May 28 10:19:51 node1 pacemakerd[1254]: notice:

    pcmk_quorum_notification: Membership 25740: quorum lost (1)<br>

    May 28 10:19:51 node1 pacemakerd[1254]: notice:

    crm_update_peer_state: pcmk_quorum_notification: Node node3[3] -

    state is now lost (was member)<br>

    May 28 10:19:52 node1 corosync[1040]: [TOTEM ] Automatically

    recovered ring 1<br>

    <br>

    <b>H</b><b>ere's corosync.conf:</b><br>

    <br>

    totem {<br>

      version: 2<br>

      secauth: off<br>

      cluster_name: cluster_greenarrow<br>

      rrp_mode: passive<br>

      transport: udpu<br>

    }<br>

    <br>

    nodelist {<br>

      node {<br>

        ring0_addr: node1<br>

        ring1_addr: 10.10.10.2<br>

        nodeid: 1<br>

      }<br>

      node {<br>

        ring0_addr: node2<br>

        ring1_addr: 10.10.10.3<br>

        nodeid: 2<br>

      }<br>

      node {<br>

        ring0_addr: node3<br>

        nodeid: 3<br>

      }<br>

    }<br>

    <br>

    quorum {<br>

      provider: corosync_votequorum<br>

      two_node: 0<br>

    }<br>

    <br>

    logging {<br>

      to_syslog: yes<br>

    }<br>

    <br>

    Thanks,<br>

    <br>

    Matt<br>

  </body>

</html>