<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>
    </p>
    <div class="moz-text-html" lang="x-unicode">
      <div dir="ltr">Problem: When performing "pcs node standby" on the
        current master, this node demotes fine but the slave doesn't
        promote to master. It keeps  looping the same error including
        "Refusing to be Primary while peer is  not outdated" and "Could
        not connect to the CIB." At this point the old  master has
        already unloaded drbd. The only way to fix it is to start  drbd
        on the standby node (e.g. drbdadm r0 up). Logs contained herein
        are  from the node trying to be master.<br>
        <br>
        I have done this on DRBD9/Centos7/Pacemaker1 w/o any problems.
        So I don't know were the issue is (<a
          href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>?
        DRBD? newer pacemaker?). DRBD seems  to work fine; unclear if
        there are some additional configs I need to do.  There are some
        slight pcs config changes between Centos 7 & 8 (Pacemaker
        1->2)<br>
      </div>
      <div dir="ltr"><br>
      </div>
      <div dir="ltr">Another odd data point: On the slave if I do a "pcs
        node standby" & then unstandby, DRBD is loaded again;
        HOWEVER, when I do this on the master (which should then be
        slave), DRBD doesn't get loaded.<br>
        <br>
        Stonith/Fencing doesn't seem to make a difference. Not sure if
        auto-promote is required.<br>
      </div>
      <div dir="ltr"><br>
        Appreciate any help!<br>
        <br>
        Brent<br>
        <br>
        <br>
        Basic Config (Centos 8 packages):<br>
        --------------------------------<br>
        2 Node Master/Slave<br>
        OS: Centos8<br>
        Pacemaker: pacemaker-2.0.4-6.el8_3.1<br>
        Corosync: corosync-3.0.3-4.el8<br>
        <br>
        <br>
        DRBD config:<br>
        ------------<br>
        resource r0 {<br>
                protocol C;<br>
        <br>
                disk {<br>
                        on-io-error             detach;<br>
                        no-disk-flushes ;<br>
                        no-disk-barrier;<br>
                        c-plan-ahead 10;<br>
                        c-fill-target 24M;<br>
                        c-min-rate 10M;<br>
                        c-max-rate 1000M;<br>
                }<br>
                net {<br>
                        fencing resource-only;<br>
        <br>
                        # max-epoch-size        20000;<br>
                        max-buffers             36k;<br>
                        sndbuf-size             1024k ;<br>
                        rcvbuf-size             2048k;<br>
                }<br>
                handlers {<br>
                        # these handlers are necessary for drbd 9.0 +
        pacemaker compatibility<br>
                        fence-peer "/usr/lib/drbd/<a
          href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>
        --timeout 30 --dc-timeout 60";<br>
                        after-resync-target "/usr/lib/drbd/<a
          href="http://crm-unfence-peer.9.sh">crm-unfence-peer.9.sh</a>";<br>
                }<br>
                options {<br>
                auto-promote yes;<br>
                }<br>
                on nfs5 {<br>
                        node-id   0;<br>
                        device    /dev/drbd0;<br>
                        disk      /dev/sdb1;<br>
                        address   <a href="http://10.1.3.35:7788">10.1.3.35:7788</a>;<br>
                        meta-disk internal;<br>
                }<br>
                on nfs6 {<br>
                        node-id   1;<br>
                        device    /dev/drbd0;<br>
                        disk      /dev/sdb1;<br>
                        address   <a href="http://10.1.3.36:7788">10.1.3.36:7788</a>;<br>
                        meta-disk internal;<br>
                }<br>
        }<br>
        <br>
        <br>
        <br>
        <br>
        Pacemaker Config<br>
        ----------------<br>
        Cluster Name: nfs<br>
        Corosync Nodes:<br>
         nfs5 nfs6<br>
        Pacemaker Nodes:<br>
         nfs5 nfs6<br>
        <br>
        Resources:<br>
         Group: cluster_group<br>
          Resource: fs_drbd (class=ocf provider=heartbeat
        type=Filesystem)<br>
           Attributes: device=/dev/drbd0 directory=/data/ fstype=xfs<br>
           Meta Attrs: target-role=Started<br>
           Operations: monitor interval=20s timeout=40s
        (fs_drbd-monitor-interval-20s)<br>
                       start interval=0 timeout=60
        (fs_drbd-start-interval-0)<br>
                       stop interval=0 timeout=60
        (fs_drbd-stop-interval-0)<br>
         Clone: drbd0-clone<br>
          Meta Attrs: clone-max=2 clone-node-max=1 notify=true
        promotable=true promoted-max=1 promoted-node-max=1<br>
          Resource: drbd0 (class=ocf provider=linbit type=drbd)<br>
           Attributes: drbd_resource=r0<br>
           Operations: demote interval=0s timeout=90
        (drbd0-demote-interval-0s)<br>
                       monitor interval=20 role=Slave timeout=20
        (drbd0-monitor-interval-20)<br>
                       monitor interval=10 role=Master timeout=20
        (drbd0-monitor-interval-10)<br>
                       notify interval=0s timeout=90
        (drbd0-notify-interval-0s)<br>
                       promote interval=0s timeout=90
        (drbd0-promote-interval-0s)<br>
                       reload interval=0s timeout=30
        (drbd0-reload-interval-0s)<br>
                       start interval=0s timeout=240
        (drbd0-start-interval-0s)<br>
                       stop interval=0s timeout=100
        (drbd0-stop-interval-0s)<br>
        <br>
        Stonith Devices:<br>
        Fencing Levels:<br>
        <br>
        Location Constraints:<br>
        Ordering Constraints:<br>
          promote drbd0-clone then start cluster_group (kind:Mandatory)
        (id:nfs_after_drbd)<br>
        Colocation Constraints:<br>
          cluster_group with drbd0-clone (score:INFINITY)
        (with-rsc-role:Master) (id:nfs_on_drbd)<br>
        Ticket Constraints:<br>
        <br>
        Alerts:<br>
         No alerts defined<br>
        <br>
        Resources Defaults:<br>
          No defaults set<br>
        Operations Defaults:<br>
          No defaults set<br>
        <br>
        Cluster Properties:<br>
         cluster-infrastructure: corosync<br>
         cluster-name: nfs<br>
         dc-version: 2.0.4-6.el8_3.1-2deceaa3ae<br>
         have-watchdog: false<br>
         last-lrm-refresh: 1610570527<br>
         no-quorum-policy: ignore<br>
         stonith-enabled: false<br>
        <br>
        Tags:<br>
         No tags defined<br>
        <br>
        Quorum:<br>
          Options:<br>
            wait_for_all: 0<br>
        <br>
        <br>
        Error Logs<br>
        ----------<br>
        <br>
        pacemaker-controld[7673]: notice: Result of notify operation for
        drbd0 on nfs5: ok<br>
        kernel: drbd r0 nfs6: peer( Primary -> Secondary )<br>
        pacemaker-controld[7673]: notice: Result of notify operation for
        drbd0 on nfs5: ok<br>
        pacemaker-controld[7673]: notice: Result of notify operation for
        drbd0 on nfs5: ok<br>
        kernel: drbd r0 nfs6: Preparing remote state change 3411954157<br>
        kernel: drbd r0 nfs6: Committing remote state change 3411954157
        (primary_nodes=0)<br>
        kernel: drbd r0 nfs6: conn( Connected -> TearDown ) peer(
        Secondary -> Unknown )<br>
        kernel: drbd r0/0 drbd0 nfs6: pdsk( UpToDate -> DUnknown )
        repl( Established -> Off )<br>
        kernel: drbd r0 nfs6: ack_receiver terminated<br>
        kernel: drbd r0 nfs6: Terminating ack_recv thread<br>
        kernel: drbd r0 nfs6: Restarting sender thread<br>
        drbdadm[89851]: drbdadm: Unknown command 'disconnected'<br>
        kernel: drbd r0 nfs6: Connection closed<br>
        kernel: drbd r0 nfs6: helper command: /sbin/drbdadm disconnected<br>
        kernel: drbd r0 nfs6: helper command: /sbin/drbdadm disconnected
        exit code 1 (0x100)<br>
        kernel: drbd r0 nfs6: conn( TearDown -> Unconnected )<br>
        kernel: drbd r0 nfs6: Restarting receiver thread<br>
        kernel: drbd r0 nfs6: conn( Unconnected -> Connecting )<br>
        pacemaker-attrd[7671]: notice: Setting master-drbd0[nfs6]: 10000
        -> (unset)<br>
        pacemaker-attrd[7671]: notice: Setting master-drbd0[nfs5]: 10000
        -> 1000<br>
        pacemaker-controld[7673]: notice: Result of notify operation for
        drbd0 on nfs5: ok<br>
        pacemaker-controld[7673]: notice: Result of notify operation for
        drbd0 on nfs5: ok<br>
        kernel: drbd r0 nfs6: helper command: /sbin/drbdadm fence-peer<br>
        DRBD_NODE_ID_1=nfs6 DRBD_PEER_ADDRESS=10.1.1.36
        DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=1 DRBD_RESOURCE=r0
        DRBD_VOLUME=0 UP_TO_DATE_NODES=0x00000001 /usr/lib/drbd/<a
          href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a><br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (qb_rb_open_2) #011debug: shm size:131085; real_size:135168;
        rb->word_size:33792<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (qb_rb_open_2) #011debug: shm size:131085; real_size:135168;
        rb->word_size:33792<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (qb_rb_open_2) #011debug: shm size:131085; real_size:135168;
        rb->word_size:33792<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (connect_with_main_loop) #011debug: Connected to controller IPC
        (attached to main loop)<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (post_connect) #011debug: Sent IPC hello to controller<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (qb_ipcc_disconnect) #011debug: qb_ipcc_disconnect()<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (qb_rb_close_helper) #011debug: Closing ringbuffer:
        /dev/shm/qb-7673-89963-13-RTpTPN/qb-request-crmd-header<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (qb_rb_close_helper) #011debug: Closing ringbuffer:
        /dev/shm/qb-7673-89963-13-RTpTPN/qb-response-crmd-header<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (qb_rb_close_helper) #011debug: Closing ringbuffer:
        /dev/shm/qb-7673-89963-13-RTpTPN/qb-event-crmd-header<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (ipc_post_disconnect) #011info: Disconnected from controller IPC
        API<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (pcmk_free_ipc_api) #011debug: Releasing controller IPC API<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (crm_xml_cleanup) #011info: Cleaning up memory from libxml2<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        (crm_exit) #011info: Exiting crm_node | with status 0<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        /<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        Could not connect to the CIB: No such device or address<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        Init failed, could not perform requested operations<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[89928]:
        WARNING DATA INTEGRITY at RISK: could not place the fencing
        constraint!<br>
        kernel: drbd r0 nfs6: helper command: /sbin/drbdadm fence-peer
        exit code 1 (0x100)<br>
        kernel: drbd r0 nfs6: fence-peer helper broken, returned 1<br>
        kernel: drbd r0: State change failed: Refusing to be Primary
        while peer is not outdated<br>
        kernel: drbd r0: Failed: role( Secondary -> Primary )<br>
        kernel: drbd r0 nfs6: helper command: /sbin/drbdadm fence-peer<br>
        DRBD_BACKING_DEV_0=/dev/sdb1 DRBD_CONF=/etc/drbd.conf
        DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/sdb1 DRBD_MINOR=0
        DRBD_MINOR_0=0 DRBD_MY_ADDRESS=10.1.1.35 DRBD_MY_AF=ipv4
        DRBD_MY_NODE_ID=0 DRBD_NODE_ID_0=nfs5 DRBD_NODE_ID_1=nfs6
        DRBD_PEER_ADDRESS=10.1.1.36 DRBD_PEER_AF=ipv4
        DRBD_PEER_NODE_ID=1 DRBD_RESOURCE=r0 DRBD_VOLUME=0
        UP_TO_DATE_NODES=0x00000001 /usr/lib/drbd/<a
          href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a><br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[24197]:
        (qb_rb_open_2) #011debug: shm size:131085; real_size:135168;
        rb->word_size:33792<br>
        <a href="http://crm-fence-peer.9.sh">crm-fence-peer.9.sh</a>[24197]:
        (qb_rb_open_2) #011debug: shm size:131085; real_size:135168;
        rb->word_size:33792<br>
        ...<br>
      </div>
    </div>
  <div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br />
<table style="border-top: 1px solid #D3D4DE;">
        <tr>
        <td style="width: 55px; padding-top: 13px;"><a href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon" target="_blank"><img src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif" alt="" width="46" height="29" style="width: 46px; height: 29px;" /></a></td>
                <td style="width: 470px; padding-top: 12px; color: #41424e; font-size: 13px; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">Virus-free. <a href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link" target="_blank" style="color: #4453ea;">www.avast.com</a>
                </td>
        </tr>
</table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1" height="1"> </a></div></body>
</html>