[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

Digimer lists at alteeve.ca
Tue Aug 29 20:11:58 UTC 2017


On 2017-08-29 10:45 AM, Ferenc Wágner wrote:
> Digimer <lists at alteeve.ca> writes:
> 
>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote:
>>
>>> [...]
>>> While dlm_tool status reports (similar on all nodes):
>>>
>>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088
>>> daemon now 2941405 fence_pid 0 
>>> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0
>>> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0
>>> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0
>>> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0
>>> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0
>>> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0
>>>
>>> dlm_tool ls shows "kern_stop":
>>>
>>> dlm lockspaces
>>> name          clvmd
>>> id            0x4104eefa
>>> flags         0x00000004 kern_stop
>>> change        member 5 joined 0 remove 1 failed 1 seq 8,8
>>> members       167773705 167773706 167773707 167773708 167773710 
>>> new change    member 6 joined 1 remove 0 failed 0 seq 9,9
>>> new status    wait messages 1
>>> new members   167773705 167773706 167773707 167773708 167773709 167773710 
>>>
>>> on all nodes except for vhbl07 (167773709), where it gives
>>>
>>> dlm lockspaces
>>> name          clvmd
>>> id            0x4104eefa
>>> flags         0x00000000 
>>> change        member 6 joined 1 remove 0 failed 0 seq 11,11
>>> members       167773705 167773706 167773707 167773708 167773709 167773710 
>>>
>>> instead.
>>>
>>> [...] Is there a way to unblock DLM without rebooting all nodes?
>>
>> Looks like the lost node wasn't fenced.
> 
> Why dlm status does not report any lost node then?  Or do I misinterpret
> its output?
> 
>> Do you have fencing configured and tested? If not, DLM will block
>> forever because it won't recover until it has been told that the lost
>> peer has been fenced, by design.
> 
> What command would you recommend for unblocking DLM in this case?

First, fix fencing. Do you have that setup and working?

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould




More information about the Users mailing list