[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

Ferenc Wágner wferi at niif.hu
Tue Aug 29 10:45:40 EDT 2017


Digimer <lists at alteeve.ca> writes:

> On 2017-08-28 12:07 PM, Ferenc Wágner wrote:
>
>> [...]
>> While dlm_tool status reports (similar on all nodes):
>> 
>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088
>> daemon now 2941405 fence_pid 0 
>> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0
>> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0
>> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0
>> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0
>> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0
>> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0
>> 
>> dlm_tool ls shows "kern_stop":
>> 
>> dlm lockspaces
>> name          clvmd
>> id            0x4104eefa
>> flags         0x00000004 kern_stop
>> change        member 5 joined 0 remove 1 failed 1 seq 8,8
>> members       167773705 167773706 167773707 167773708 167773710 
>> new change    member 6 joined 1 remove 0 failed 0 seq 9,9
>> new status    wait messages 1
>> new members   167773705 167773706 167773707 167773708 167773709 167773710 
>> 
>> on all nodes except for vhbl07 (167773709), where it gives
>> 
>> dlm lockspaces
>> name          clvmd
>> id            0x4104eefa
>> flags         0x00000000 
>> change        member 6 joined 1 remove 0 failed 0 seq 11,11
>> members       167773705 167773706 167773707 167773708 167773709 167773710 
>> 
>> instead.
>> 
>> [...] Is there a way to unblock DLM without rebooting all nodes?
>
> Looks like the lost node wasn't fenced.

Why dlm status does not report any lost node then?  Or do I misinterpret
its output?

> Do you have fencing configured and tested? If not, DLM will block
> forever because it won't recover until it has been told that the lost
> peer has been fenced, by design.

What command would you recommend for unblocking DLM in this case?
-- 
Thanks,
Feri




More information about the Users mailing list