[ClusterLabs] [Linux-cluster] DLM won't (stay) running

Andrew Price anprice at redhat.com
Wed May 9 06:26:13 EDT 2018


[linux-cluster@ isn't really used nowadays; CCing users at clusterlabs]

On 08/05/18 12:18, Jason Gauthier wrote:
> Greetings,
> 
>     I'm working on a setup of a two-node cluster with shared storage.
> I've been able to see the storage on both nodes, and appropriate
> configuration for fencing the bock device.
> 
> The next step was getting DLM and GFS2 in a clone group to mount the
> FS on both drives.  This is where I am running into trouble.
> 
> As far as the OS goes, it's debian.  I'm using pacemaker, corosync,
> and crm for cluster management.

Is it safe to assume that you're using Debian Wheezy? (The need for 
gfs_controld disappeared in the 3.3 kernel.) As wheezy goes end-of-life 
at the end of the month I would suggest upgrading, you will likely find 
the cluster tools more user friendly and the components more stable.

Andy

> At the moment, I've removed the gfs2 parts just to try and get dlm working.
> 
> My current config looks like this:
> 
> node 1084772368: alpha
> node 1084772369: beta
> primitive p_dlm_controld ocf:pacemaker:controld \
>          op monitor interval=60 timeout=60 \
>          meta target-role=Started args=-K
> primitive p_gfs_controld ocf:pacemaker:controld \
>          params daemon=gfs_controld \
>          meta target-role=Started
> primitive stonith_sbd stonith:external/sbd \
>          params pcmk_delay_max=30 sbd_device="/dev/sdb1"
> group g_gfs2 p_dlm_controld p_gfs_controld
> clone cl_gfs2 g_gfs2 \
>          meta interleave=true target-role=Started
> property cib-bootstrap-options: \
>          have-watchdog=false \
>          dc-version=1.1.16-94ff4df \
>          cluster-infrastructure=corosync \
>          cluster-name=zeta \
>          last-lrm-refresh=1525523370 \
>          stonith-enabled=true \
>          stonith-timeout=20s
> 
> When a bring the resources up, I get a quick blip in my logs.
> May  8 07:13:58 beta dlm_controld[9425]: 253556 dlm_controld 4.0.7 started
> May  8 07:14:00 beta kernel: [253558.641658] dlm: closing connection
> to node 1084772369
> May  8 07:14:00 beta kernel: [253558.641764] dlm: closing connection
> to node 1084772368
> 
> 
> This is the same messaging I see when I run dlm manually and then stop
> it.  My challenge here is that I cannot find out what dlm is doing.
> I've tried adding -K to /etc/default/dlm, but I don't think that file
> is being respected. I would like to figure out how to increase the
> verbose output of dlm_controld so I can see why it won't stay running
> when it's launched through the cluster.   I haven't been able to
> figure out how to pass arguments directly to the a daemon in the
> primitive config, if it's even possible.  Otherwise, I would try to
> pass -K there.
> 
> Thanks!
> 
> Jason
> 


More information about the Users mailing list