[ClusterLabs] DRBD 2-node M/S doesn't want to promote new master, Centos 8

Mon Jan 18 17:48:39 EST 2021

I replaced the fence-peer and after-resync-target (same script) in the 
drbd resource with the latest version in git. This seemed to fix the 
problem (although I still see that drbdadm disconnected error). It 
behaves slightly different that my previous clusters. When you demote 
the primary node (node standby) it creates a location constraint on the 
standby node, e.g:
Location Constraints:
   Resource: drbd0-clone
     Constraint: drbd-fence-by-handler-r0-drbd0-clone
       Rule: role=Master score=-INFINITY
         Expression: #uname ne nfs5

Which doesn't allow the secondary node to promote (different issue). If 
I remove that constrain then it works. Not sure what's going on here. 
Anyone have experience here?

Thanks,

Brent

On 1/18/2021 2:43 PM, Ken Gaillot wrote:
> The part that sticks out to me is "Cluster is now split" followed by
> "helper command: /sbin/drbdadm fence-peer", which I believe should not
> happen after a clean demote/stop of the other side, and then crm-fence-
> peer.9.sh says "Could not connect to the CIB: No such device or
> address". The unknown command error is also suspicious.
>
> I'd make sure the installed versions of everything are happy with each
> other (i.e. the drbd utils version supports the installed kernel module
> and pacemaker versions, and similarly with the resource agent if it
> came separately). I'm not familiar enough with DRBD 9 to know if any
> further configuration changes are needed.
>
> On Sun, 2021-01-17 at 12:00 -0700, Brent Jensen wrote:
>> Here are some more log files (notice the error on 'helper command:
>> /sbin/drbdadm disconnected') on the Primary Node logs
>>
>> Master (Primary) Mode going Standby
>> -----------------------------------
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Result of stop
>> operation for nfs5-stonith on nfs6: ok
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Result of
>> notify operation for drbd0 on nfs6: ok
>> Jan 17 11:48:14 nfs6 Filesystem(fs_drbd)[797290]: INFO: Running stop
>> for /dev/drbd0 on /data
>> Jan 17 11:48:14 nfs6 Filesystem(fs_drbd)[797290]: INFO: Trying to
>> unmount /data
>> Jan 17 11:48:14 nfs6 systemd[1923]: data.mount: Succeeded.
>> Jan 17 11:48:14 nfs6 systemd[1]: data.mount: Succeeded.
>> Jan 17 11:48:14 nfs6 kernel: XFS (drbd0): Unmounting Filesystem
>> Jan 17 11:48:14 nfs6 Filesystem(fs_drbd)[797290]: INFO: unmounted
>> /data successfully
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Result of stop
>> operation for fs_drbd on nfs6: ok
>> Jan 17 11:48:14 nfs6 kernel: drbd r0: role( Primary -> Secondary )
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Result of
>> demote operation for drbd0 on nfs6: ok
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Result of
>> notify operation for drbd0 on nfs6: ok
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Result of
>> notify operation for drbd0 on nfs6: ok
>> Jan 17 11:48:14 nfs6 kernel: drbd r0: Preparing cluster-wide state
>> change 59605293 (1->0 496/16)
>> Jan 17 11:48:14 nfs6 kernel: drbd r0: State change 59605293:
>> primary_nodes=0, weak_nodes=0
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: Cluster is now split
>> Jan 17 11:48:14 nfs6 kernel: drbd r0: Committing cluster-wide state
>> change 59605293 (0ms)
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: conn( Connected ->
>> Disconnecting ) peer( Secondary -> Unknown )
>> Jan 17 11:48:14 nfs6 kernel: drbd r0/0 drbd0 nfs5: pdsk( UpToDate ->
>> DUnknown ) repl( Established -> Off )
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: ack_receiver terminated
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: Terminating ack_recv
>> thread
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: Restarting sender thread
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: Connection closed
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: helper command:
>> /sbin/drbdadm disconnected
>> Jan 17 11:48:14 nfs6 drbdadm[797503]: drbdadm: Unknown command
>> 'disconnected'
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: helper command:
>> /sbin/drbdadm disconnected exit code 1 (0x100)
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: conn( Disconnecting ->
>> StandAlone )
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: Terminating receiver
>> thread
>> Jan 17 11:48:14 nfs6 kernel: drbd r0 nfs5: Terminating sender thread
>> Jan 17 11:48:14 nfs6 kernel: drbd r0/0 drbd0: disk( UpToDate ->
>> Detaching )
>> Jan 17 11:48:14 nfs6 kernel: drbd r0/0 drbd0: disk( Detaching ->
>> Diskless )
>> Jan 17 11:48:14 nfs6 kernel: drbd r0/0 drbd0: drbd_bm_resize called
>> with capacity == 0
>> Jan 17 11:48:14 nfs6 kernel: drbd r0: Terminating worker thread
>> Jan 17 11:48:14 nfs6 pacemaker-attrd[1691]: notice: Setting master-
>> drbd0[nfs6]: 10000 -> (unset)
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Result of stop
>> operation for drbd0 on nfs6: ok
>> Jan 17 11:48:14 nfs6 pacemaker-attrd[1691]: notice: Setting master-
>> drbd0[nfs5]: 10000 -> 1000
>> Jan 17 11:48:14 nfs6 pacemaker-controld[1693]: notice: Current ping
>> state: S_NOT_DC
>>
>>
>> Secondary Node going primary (fails)
>> ------------------------------------
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: State
>> transition S_IDLE -> S_POLICY_ENGINE
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice: On loss of
>> quorum: Ignore
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice:  *
>> Move       fs_drbd          (         nfs6 -> nfs5 )
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice:  *
>> Stop       nfs5-stonith     (                 nfs6 )   due to node
>> availability
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice:  *
>> Stop       drbd0:0          (          Master nfs6 )   due to node
>> availability
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice:  *
>> Promote    drbd0:1          ( Slave -> Master nfs5 )
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice: Calculated
>> transition 490, saving inputs in /var/lib/pacemaker/pengine/pe-input-
>> 123.bz2
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> stop operation fs_drbd_stop_0 on nfs6
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> stop operation nfs5-stonith_stop_0 on nfs6
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> cancel operation drbd0_monitor_20000 locally on nfs5
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_pre_notify_demote_0 on nfs6
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_pre_notify_demote_0 locally on nfs5
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Result of
>> notify operation for drbd0 on nfs5: ok
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> demote operation drbd0_demote_0 on nfs6
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: peer( Primary -> Secondary
>> )
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_post_notify_demote_0 on nfs6
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_post_notify_demote_0 locally on nfs5
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Result of
>> notify operation for drbd0 on nfs5: ok
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_pre_notify_stop_0 on nfs6
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_pre_notify_stop_0 locally on nfs5
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Result of
>> notify operation for drbd0 on nfs5: ok
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> stop operation drbd0_stop_0 on nfs6
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: Preparing remote state
>> change 59605293
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: Committing remote state
>> change 59605293 (primary_nodes=0)
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: conn( Connected ->
>> TearDown ) peer( Secondary -> Unknown )
>> Jan 17 11:48:14 nfs5 kernel: drbd r0/0 drbd0 nfs6: pdsk( UpToDate ->
>> DUnknown ) repl( Established -> Off )
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: ack_receiver terminated
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: Terminating ack_recv
>> thread
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: Restarting sender thread
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: Connection closed
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: helper command:
>> /sbin/drbdadm disconnected
>> Jan 17 11:48:14 nfs5 drbdadm[570326]: drbdadm: Unknown command
>> 'disconnected'
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: helper command:
>> /sbin/drbdadm disconnected exit code 1 (0x100)
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: conn( TearDown ->
>> Unconnected )
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: Restarting receiver thread
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: conn( Unconnected ->
>> Connecting )
>> Jan 17 11:48:14 nfs5 pacemaker-attrd[207003]: notice: Setting master-
>> drbd0[nfs6]: 10000 -> (unset)
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Transition
>> 490 aborted by deletion of nvpair[@id='status-2-master-drbd0']:
>> Transient attribute change
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_post_notify_stop_0 locally on nfs5
>> Jan 17 11:48:14 nfs5 pacemaker-attrd[207003]: notice: Setting master-
>> drbd0[nfs5]: 10000 -> 1000
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Result of
>> notify operation for drbd0 on nfs5: ok
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Transition
>> 490 (Complete=27, Pending=0, Fired=0, Skipped=1, Incomplete=13,
>> Source=/var/lib/pacemaker/pengine/pe-input-123.bz2): Stopped
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice: On loss of
>> quorum: Ignore
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice:  *
>> Start      fs_drbd          (                 nfs5 )
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice:  *
>> Promote    drbd0:0          ( Slave -> Master nfs5 )
>> Jan 17 11:48:14 nfs5 pacemaker-schedulerd[207004]: notice: Calculated
>> transition 491, saving inputs in /var/lib/pacemaker/pengine/pe-input-
>> 124.bz2
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> notify operation drbd0_pre_notify_promote_0 locally on nfs5
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Result of
>> notify operation for drbd0 on nfs5: ok
>> Jan 17 11:48:14 nfs5 pacemaker-controld[207005]: notice: Initiating
>> promote operation drbd0_promote_0 locally on nfs5
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: helper command:
>> /sbin/drbdadm fence-peer
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]:
>> DRBD_BACKING_DEV_0=/dev/sdb1 DRBD_CONF=/etc/drbd.conf
>> DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/sdb1 DRBD_MINOR=0
>> DRBD_MINOR_0=0 DRBD_MY_ADDRESS=10.1.3.35 DRBD_MY_AF=ipv4
>> DRBD_MY_NODE_ID=0 DRBD_NODE_ID_0=nfs5 DRBD_NODE_ID_1=nfs6
>> DRBD_PEER_ADDRESS=10.1.3.36 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=1
>> DRBD_RESOURCE=r0 DRBD_VOLUME=0 UP_TO_DATE_NODES=0x00000001
>> /usr/lib/drbd/crm-fence-peer.9.sh
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: (qb_rb_open_2)
>> #011debug: shm size:131085; real_size:135168; rb->word_size:33792
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: (qb_rb_open_2)
>> #011debug: shm size:131085; real_size:135168; rb->word_size:33792
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: (qb_rb_open_2)
>> #011debug: shm size:131085; real_size:135168; rb->word_size:33792
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]:
>> (connect_with_main_loop) #011debug: Connected to controller
>> IPC         (attached to main loop)
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: (post_connect)
>> #011debug: Sent IPC hello to controller
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]:
>> (qb_ipcc_disconnect) #011debug: qb_ipcc_disconnect()
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]:
>> (qb_rb_close_helper) #011debug: Closing ringbuffer: /dev/shm/qb-
>> 207005-570438-16-rx95PO/qb-request-crmd-header
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]:
>> (qb_rb_close_helper) #011debug: Closing ringbuffer: /dev/shm/qb-
>> 207005-570438-16-rx95PO/qb-response-crmd-header
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]:
>> (qb_rb_close_helper) #011debug: Closing ringbuffer: /dev/shm/qb-
>> 207005-570438-16-rx95PO/qb-event-crmd-header
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]:
>> (ipc_post_disconnect) #011info: Disconnected from controller IPC API
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: (pcmk_free_ipc_api)
>> #011debug: Releasing controller IPC API
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: (crm_xml_cleanup)
>> #011info: Cleaning up memory from libxml2
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: (crm_exit)
>> #011info: Exiting crm_node | with status 0
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: /
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: Could not connect
>> to the CIB: No such device or address
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: Init failed, could
>> not perform requested operations
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570403]: WARNING DATA
>> INTEGRITY at RISK: could not place the fencing constraint!
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: helper command:
>> /sbin/drbdadm fence-peer exit code 1 (0x100)
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: fence-peer helper broken,
>> returned 1
>> Jan 17 11:48:14 nfs5 kernel: drbd r0: State change failed: Refusing
>> to be Primary while peer is not outdated
>> Jan 17 11:48:14 nfs5 kernel: drbd r0: Failed: role( Secondary ->
>> Primary )
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: helper command:
>> /sbin/drbdadm fence-peer
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]:
>> DRBD_BACKING_DEV_0=/dev/sdb1 DRBD_CONF=/etc/drbd.conf
>> DRBD_CSTATE=Connecting DRBD_LL_DISK=/dev/sdb1 DRBD_MINOR=0
>> DRBD_MINOR_0=0 DRBD_MY_ADDRESS=10.1.3.35 DRBD_MY_AF=ipv4
>> DRBD_MY_NODE_ID=0 DRBD_NODE_ID_0=nfs5 DRBD_NODE_ID_1=nfs6
>> DRBD_PEER_ADDRESS=10.1.3.36 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=1
>> DRBD_RESOURCE=r0 DRBD_VOLUME=0 UP_TO_DATE_NODES=0x00000001
>> /usr/lib/drbd/crm-fence-peer.9.sh
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: (qb_rb_open_2)
>> #011debug: shm size:131085; real_size:135168; rb->word_size:33792
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: (qb_rb_open_2)
>> #011debug: shm size:131085; real_size:135168; rb->word_size:33792
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: (qb_rb_open_2)
>> #011debug: shm size:131085; real_size:135168; rb->word_size:33792
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]:
>> (connect_with_main_loop) #011debug: Connected to controller
>> IPC         (attached to main loop)
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: (post_connect)
>> #011debug: Sent IPC hello to controller
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]:
>> (qb_ipcc_disconnect) #011debug: qb_ipcc_disconnect()
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]:
>> (qb_rb_close_helper) #011debug: Closing ringbuffer: /dev/shm/qb-
>> 207005-570490-16-D2a84t/qb-request-crmd-header
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]:
>> (qb_rb_close_helper) #011debug: Closing ringbuffer: /dev/shm/qb-
>> 207005-570490-16-D2a84t/qb-response-crmd-header
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]:
>> (qb_rb_close_helper) #011debug: Closing ringbuffer: /dev/shm/qb-
>> 207005-570490-16-D2a84t/qb-event-crmd-header
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]:
>> (ipc_post_disconnect) #011info: Disconnected from controller IPC API
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: (pcmk_free_ipc_api)
>> #011debug: Releasing controller IPC API
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: (crm_xml_cleanup)
>> #011info: Cleaning up memory from libxml2
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: (crm_exit)
>> #011info: Exiting crm_node | with status 0
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: /
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: Could not connect
>> to the CIB: No such device or address
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: Init failed, could
>> not perform requested operations
>> Jan 17 11:48:14 nfs5 crm-fence-peer.9.sh[570455]: WARNING DATA
>> INTEGRITY at RISK: could not place the fencing constraint!
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: helper command:
>> /sbin/drbdadm fence-peer exit code 1 (0x100)
>> Jan 17 11:48:14 nfs5 kernel: drbd r0 nfs6: fence-peer helper broken,
>> returned 1
>> Jan 17 11:48:14 nfs5 drbd(drbd0)[570375]: ERROR: r0: Called drbdadm
>> -c /etc/drbd.conf primary r0
>> Jan 17 11:48:14 nfs5 drbd(drbd0)[570375]: ERROR: r0: Exit code 11
>> Jan 17 11:48:14 nfs5 drbd(drbd0)[570375]: ERROR: r0: Command output:
>> Jan 17 11:48:15 nfs5 drbd(drbd0)[570375]: ERROR: r0: Command stderr:
>> r0: State change failed: (-7) Refusing to be Primary while peer is
>> not outdated#012Command 'drbdsetup primary r0' terminated with exit
>> code 11
>> Jan 17 11:48:15 nfs5 kernel: drbd r0 nfs6: helper command:
>> /sbin/drbdadm fence-peer
>> ...
>>
>>
>>
>> On 1/16/2021 11:07 AM, Strahil Nikolov wrote:
>>> В 14:10 -0700 на 15.01.2021 (пт), Brent Jensen написа:
>>>> Problem: When performing "pcs node standby" on the current
>>>> master, this node demotes fine but the slave doesn't promote to
>>>> master. It keeps  looping the same error including "Refusing to
>>>> be Primary while peer is  not outdated" and "Could not connect to
>>>> the CIB." At this point the old  master has already unloaded
>>>> drbd. The only way to fix it is to start  drbd on the standby
>>>> node (e.g. drbdadm r0 up). Logs contained herein are  from the
>>>> node trying to be master.
>>>>
>>> In order to debug, stop the cluster and verify that drbd is running
>>> properly. Promote one of the nodes, then demote and promote another
>>> one...
>>>> I have done this on DRBD9/Centos7/Pacemaker1 w/o any problems. So
>>>> I don't know were the issue is (crm-fence-peer.9.sh
>>>>
>>>> Another odd data point: On the slave if I do a "pcs node standby"
>>>> & then unstandby, DRBD is loaded again; HOWEVER, when I do this
>>>> on the master (which should then be slave), DRBD doesn't get
>>>> loaded.
>>>>
>>>> Stonith/Fencing doesn't seem to make a difference. Not sure if
>>>> auto-promote is required.
>>>>
>>> Quote from official documentation (
>>> https://www.linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-pacemaker-crm-drbd-backed-service
>>> ):
>>> If you are employing the DRBD OCF resource agent, it is recommended
>>> that you defer DRBD startup, shutdown, promotion, and
>>> demotion exclusively to the OCF resource agent. That means that you
>>> should disable the DRBD init script:
>>> So remove the autopromote and disable the drbd service at all.
>>>
>>> Best Regards, Strahil Nikolov
>>>
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>> 	Virus-free. www.avast.com
>>   _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus