[ClusterLabs] DRBD 2-node M/S doesn't want to promote new master, Centos 8

Sun Jan 17 13:37:49 EST 2021

I have already verified DRBD works. As always, I use the OCF to manage 
DRBD; never started initially. I added auto-promote to see if that made 
a difference (it didn't). So I'm still at a loss as to why it won't 
promote. It behaves as if DRBD gets unleaded on the master node before 
it demoting to allows the secondary to promote.

For example (I'm doing this manually w/o the cluster running).

On master node (as primary), I stop DRBD (normally you'd demote it first)

On secondary node (as secondary):

drbdadm primary all

crm-fence-peer.9.sh[564707]: (ipc_post_disconnect) #011info: 
Disconnected from controller IPC API
crm-fence-peer.9.sh[564707]: (pcmk_free_ipc_api) #011debug: Releasing 
controller IPC API
crm-fence-peer.9.sh[564707]: (crm_xml_cleanup) #011info: Cleaning up 
memory from libxml2
crm-fence-peer.9.sh[564707]: (crm_exit) #011info: Exiting crm_node | 
with status 0
crm-fence-peer.9.sh[564707]: /
crm-fence-peer.9.sh[564707]: Could not connect to the CIB: No such 
device or address
crm-fence-peer.9.sh[564707]: Init failed, could not perform requested 
operations
crm-fence-peer.9.sh[564707]: WARNING DATA INTEGRITY at RISK: could not 
place the fencing constraint!
kernel: drbd r0 nfs6: helper command: /sbin/drbdadm fence-peer exit code 
1 (0x100)
kernel: drbd r0 nfs6: fence-peer helper broken, returned 1

All of this is EXPECTED BEHAVIOR. With DRBD unloaded PRIOR to demoting 
on the primary node, the other node CANNOT promote to primary. This is 
the same thing I'm experiencing when running in the cluster. It looks 
like DRBD is being unloaded PRIOR to the master being demoted. WHY??? 
I'm pretty sure I'm using the basic configurations. It seems as though 
there's a bug somewhere.

Note my versions:

  * pacemaker-2.0.4-6.el8_3.1.x86_64
  * drbd90-utils-9.13.1-1.el8.elrepo.x86_64

Thanks,

Brent

On 1/16/2021 11:07 AM, Strahil Nikolov wrote:
> В 14:10 -0700 на 15.01.2021 (пт), Brent Jensen написа:
>>
>> Problem: When performing "pcs node standby" on the current master, 
>> this node demotes fine but the slave doesn't promote to master. It 
>> keeps  looping the same error including "Refusing to be Primary while 
>> peer is  not outdated" and "Could not connect to the CIB." At this 
>> point the old  master has already unloaded drbd. The only way to fix 
>> it is to start  drbd on the standby node (e.g. drbdadm r0 up). Logs 
>> contained herein are  from the node trying to be master.
>>
> In order to debug, stop the cluster and verify that drbd is running 
> properly. Promote one of the nodes, then demote and promote another one...
>> I have done this on DRBD9/Centos7/Pacemaker1 w/o any problems. So I 
>> don't know were the issue is (crm-fence-peer.9.sh 
>> <http://crm-fence-peer.9.sh>
>>
>> Another odd data point: On the slave if I do a "pcs node standby" & 
>> then unstandby, DRBD is loaded again; HOWEVER, when I do this on the 
>> master (which should then be slave), DRBD doesn't get loaded.
>>
>> Stonith/Fencing doesn't seem to make a difference. Not sure if 
>> auto-promote is required.
>>
> Quote from official documentation 
> (https://www.linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-pacemaker-crm-drbd-backed-service 
> <https://www.linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-pacemaker-crm-drbd-backed-service>):
> If you are employing the DRBD OCF resource agent, it is recommended 
> that you defer DRBD startup, shutdown, promotion, and demotion 
> /exclusively/ to the OCF resource agent. That means that you should 
> disable the DRBD init script:
> So remove the autopromote and disable the drbd service at all.
>
> Best Regards, Strahil Nikolov
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210117/e293b188/attachment.htm>