[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Ferenc Wágner
wferi at niif.hu
Tue Aug 29 10:45:40 EDT 2017
Digimer <lists at alteeve.ca> writes:
> On 2017-08-28 12:07 PM, Ferenc Wágner wrote:
>
>> [...]
>> While dlm_tool status reports (similar on all nodes):
>>
>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088
>> daemon now 2941405 fence_pid 0
>> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0
>> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0
>> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0
>> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0
>> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0
>> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0
>>
>> dlm_tool ls shows "kern_stop":
>>
>> dlm lockspaces
>> name clvmd
>> id 0x4104eefa
>> flags 0x00000004 kern_stop
>> change member 5 joined 0 remove 1 failed 1 seq 8,8
>> members 167773705 167773706 167773707 167773708 167773710
>> new change member 6 joined 1 remove 0 failed 0 seq 9,9
>> new status wait messages 1
>> new members 167773705 167773706 167773707 167773708 167773709 167773710
>>
>> on all nodes except for vhbl07 (167773709), where it gives
>>
>> dlm lockspaces
>> name clvmd
>> id 0x4104eefa
>> flags 0x00000000
>> change member 6 joined 1 remove 0 failed 0 seq 11,11
>> members 167773705 167773706 167773707 167773708 167773709 167773710
>>
>> instead.
>>
>> [...] Is there a way to unblock DLM without rebooting all nodes?
>
> Looks like the lost node wasn't fenced.
Why dlm status does not report any lost node then? Or do I misinterpret
its output?
> Do you have fencing configured and tested? If not, DLM will block
> forever because it won't recover until it has been told that the lost
> peer has been fenced, by design.
What command would you recommend for unblocking DLM in this case?
--
Thanks,
Feri
More information about the Users
mailing list