[ClusterLabs] gfs2 crashes when i, e.g., dd to a lvm volume

J. Echter j.echter at echter-kuechen-elektro.de
Thu Oct 8 18:42:50 UTC 2015


Am 08.10.2015 um 16:34 schrieb Bob Peterson:
> ----- Original Message -----
>>
>> Am 08.10.2015 um 16:15 schrieb Digimer:
>>> On 08/10/15 07:50 AM, J. Echter wrote:
>>>> Hi,
>>>>
>>>> i have a strange issue on CentOS 6.5
>>>>
>>>> If i install a new vm on node1 it works well.
>>>>
>>>> If i install a new vm on node2 it gets stuck.
>>>>
>>>> Same if i do a dd if=/dev/zero of=/dev/DATEN/vm-test (on node2)
>>>>
>>>> On node1 it works:
>>>>
>>>> dd if=/dev/zero of=vm-test
>>>> Schreiben in „vm-test“: Auf dem Gerät ist kein Speicherplatz mehr
>>>> verfügbar
>>>> 83886081+0 Datensätze ein
>>>> 83886080+0 Datensätze aus
>>>> 42949672960 Bytes (43 GB) kopiert, 2338,15 s, 18,4 MB/s
>>>>
>>>>
>>>> dmesg shows the following (while dd'ing on node2):
>>>>
>>>> INFO: task flush-253:18:9820 blocked for more than 120 seconds.
>>>>        Not tainted 2.6.32-573.7.1.el6.x86_64 #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> <snip>
>>>> any hint on fixing that?
>>> Every time I've seen this, it was because dlm was blocked. The most
>>> common cause of DLM blocking is a failed fence call. Do you have fencing
>>> configured *and* tested?
>>>
>>> If I were to guess, given the rather limited information you shared
>>> about your setup, the live migration consumed the network bandwidth,
>>> chocking out corosync traffic which caused the peer to be declared lost,
>>> called a fence which failed and left locking hung (which is by design;
>>> better to hang that risk corruption).
>>>
>> Hi,
>>
>> fencing is configured and works.
>>
>> I re-checked it by typing
>>
>> echo c > /proc/sysrq-trigger
>>
>> into node2 console.
>>
>> The machine is fenced and comes back up. But the problem persists.
> Hi,
>
> Can you send any more information about the crash? What makes you think
> it's gfs2 and not some other kernel component? Do you get any messages
> on the console? If not, perhaps you can temporarily disable or delay fencing
> long enough to get console messages.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> _______________________________________________
>
Hi,

i just recognized that gfs2 is probably the wrong candidate.

I use clustered lvm (drbd), and i experience this on a  lvm volume, not
formatted to anything.

What logs would you need to identify the cause?




More information about the Users mailing list