[ClusterLabs] DLM fencing

Jason Gauthier jagauthier at gmail.com
Thu May 24 06:47:28 EDT 2018


On Thu, May 24, 2018 at 12:19 AM, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> 24.05.2018 02:57, Jason Gauthier пишет:
>> I'm fairly new to clustering under Linux.  I've basically have one shared
>> storage resource  right now, using dlm, and gfs2.
>> I'm using fibre channel and when both of my nodes are up (2 node cluster)
>> dlm and gfs2 seem to be operating perfectly.
>> If I reboot node B, node A works fine and vice-versa.
>>
>> When node B goes offline unexpectedly, and become unclean, dlm seems to
>> block all IO to the shared storage.
>>
>> dlm knows node B is down:
>>
>> # dlm_tool status
>> cluster nodeid 1084772368 quorate 1 ring seq 32644 32644
>> daemon now 865695 fence_pid 18186
>> fence 1084772369 nodedown pid 18186 actor 1084772368 fail 1527119246 fence
>> 0 now 1527119524
>> node 1084772368 M add 861439 rem 0 fail 0 fence 0 at 0 0
>> node 1084772369 X add 865239 rem 865416 fail 865416 fence 0 at 0 0
>>
>> on the same server, I see these messages in my daemon.log
>> May 23 19:52:47 alpha stonith-api[18186]: stonith_api_kick: Could not kick
>> (reboot) node 1084772369/(null) : No route to host (-113)
>> May 23 19:52:47 alpha dlm_stonith[18186]: kick_helper error -113 nodeid
>> 1084772369
>>
>> I can recover from the situation by forcing it (or bring the other node
>> back online)
>> dlm_tool fence_ack 1084772369
>>
>> cluster config is pretty straighforward.
>> node 1084772368: alpha
>> node 1084772369: beta
>> primitive p_dlm_controld ocf:pacemaker:controld \
>>         op monitor interval=60 timeout=60 \
>>         meta target-role=Started \
>>         params args="-K -L -s 1"
>> primitive p_fs_gfs2 Filesystem \
>>         params device="/dev/sdb2" directory="/vms" fstype=gfs2
>> primitive stonith_sbd stonith:external/sbd \
>>         params pcmk_delay_max=30 sbd_device="/dev/sdb1" \
>>         meta target-role=Started
>
> What is the status of stonith resource? Did you configure SBD fencing
> properly?

I believe so.  It's shown above in my cluster config.

> Is sbd daemon up and running with proper parameters?

Well, no, apparently sbd isn't running.    With dlm, and gfs2, the
cluster controls handling launching of the daemons.
I assumed the same here, since the resource shows that it is up.

Online: [ alpha beta ]

Full list of resources:

 stonith_sbd    (stonith:external/sbd): Started alpha
 Clone Set: cl_gfs2 [g_gfs2]
     Started: [ alpha beta ]


> What is output of
> sbd -d /dev/sdb1 dump
> sbd -d /dev/sdb1 list

Both nodes seem fine.

0       alpha   test    beta
1       beta    test    alpha


> on both nodes? Does
>
> sbd -d /dev/sdb1 message <other-node> test
>
> work in both directions?

It doesn't return an error, yet without a daemon running, I don't
think the message is received either.


> Does manual fencing using stonith_admin work?

I'm not sure at the moment.  I think I need to look into why the
daemon isn't running.

>> group g_gfs2 p_dlm_controld p_fs_gfs2
>> clone cl_gfs2 g_gfs2 \
>>         meta interleave=true target-role=Started
>> location cli-prefer-cl_gfs2 cl_gfs2 role=Started inf: alpha
>> property cib-bootstrap-options: \
>>         have-watchdog=false \
>>         dc-version=1.1.16-94ff4df \
>>         cluster-infrastructure=corosync \
>>         cluster-name=zeta \
>>         last-lrm-refresh=1525523370 \
>>         stonith-enabled=true \
>>         stonith-timeout=20s
>>
>> Any pointers would be appreciated. I feel like this should be working but
>> I'm not sure if I've missed something.
>>
>> Thanks,
>>
>> Jason
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



More information about the Users mailing list