[ClusterLabs] drbd clone not becoming master
Dennis Jacobfeuerborn
dennisml at conversis.de
Fri Nov 3 23:28:59 EDT 2017
On 03.11.2017 15:49, Ken Gaillot wrote:
> On Thu, 2017-11-02 at 23:18 +0100, Dennis Jacobfeuerborn wrote:
>> On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote:
>>> Hi,
>>> I'm setting up a redundant NFS server for some experiments but
>>> almost
>>> immediately ran into a strange issue. The drbd clone resource never
>>> promotes either of the to clones to the Master state.
>>>
>>> The state says this:
>>>
>>> Master/Slave Set: drbd-clone [drbd]
>>> Slaves: [ nfsserver1 nfsserver2 ]
>>> metadata-fs (ocf::heartbeat:Filesystem): Stopped
>>>
>>> The resource configuration looks like this:
>>>
>>> Resources:
>>> Master: drbd-clone
>>> Meta Attrs: master-node-max=1 clone-max=2 notify=true master-
>>> max=1
>>> clone-node-max=1
>>> Resource: drbd (class=ocf provider=linbit type=drbd)
>>> Attributes: drbd_resource=r0
>>> Operations: demote interval=0s timeout=90 (drbd-demote-interval-
>>> 0s)
>>> monitor interval=60s (drbd-monitor-interval-60s)
>>> promote interval=0s timeout=90 (drbd-promote-
>>> interval-0s)
>>> start interval=0s timeout=240 (drbd-start-interval-
>>> 0s)
>>> stop interval=0s timeout=100 (drbd-stop-interval-0s)
>>> Resource: metadata-fs (class=ocf provider=heartbeat
>>> type=Filesystem)
>>> Attributes: device=/dev/drbd/by-res/r0/0
>>> directory=/var/lib/nfs_shared
>>> fstype=ext4 options=noatime
>>> Operations: monitor interval=20 timeout=40
>>> (metadata-fs-monitor-interval-20)
>>> start interval=0s timeout=60 (metadata-fs-start-
>>> interval-0s)
>>> stop interval=0s timeout=60 (metadata-fs-stop-
>>> interval-0s)
>>>
>>> Location Constraints:
>>> Ordering Constraints:
>>> promote drbd-clone then start metadata-fs (kind:Mandatory)
>>> Colocation Constraints:
>>> metadata-fs with drbd-clone (score:INFINITY) (with-rsc-
>>> role:Master)
>>>
>>> Shouldn't one of the clones be promoted to the Master state
>>> automatically?
>>
>> I think the source of the issue is this:
>>
>> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called
>> /usr/sbin/crm_master -Q -l reboot -v 10000
>> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107
>> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command
>> output:
>> Nov 2 23:12:03 nfsserver1 lrmd[2163]: notice:
>> drbd_monitor_60000:4673:stderr [ Error signing on to the CIB service:
>> Transport endpoint is not connected ]
>>
>> It seems the drbd resource agent tries to use crm_master to promote
>> the
>> clone but fails because it cannot "sign on to the CIB service". Does
>> anybody know what that means?
>>
>> Regards,
>> Dennis
>>
>
> That's odd, it should only happen if the cluster is not running, but
> then the agent wouldn't have been called.
>
> The CIB is one of the core daemons of pacemaker; it manages the cluster
> configuration and status. If it's not running, the cluster can't do
> anything.
>
> Perhaps the CIB is crashing, or something is blocking the communication
> between the agent and the CIB.
SELinux was the culprit. After disabling it the problem went away.
Regards,
Dennis
More information about the Users
mailing list