[ClusterLabs] gfs2 crashes when i, e.g., dd to a lvm volume

Thu Oct 8 14:27:12 UTC 2015

Am 08.10.2015 um 16:15 schrieb Digimer:
> On 08/10/15 07:50 AM, J. Echter wrote:
>> Hi,
>>
>> i have a strange issue on CentOS 6.5
>>
>> If i install a new vm on node1 it works well.
>>
>> If i install a new vm on node2 it gets stuck.
>>
>> Same if i do a dd if=/dev/zero of=/dev/DATEN/vm-test (on node2)
>>
>> On node1 it works:
>>
>> dd if=/dev/zero of=vm-test
>> Schreiben in „vm-test“: Auf dem Gerät ist kein Speicherplatz mehr verfügbar
>> 83886081+0 Datensätze ein
>> 83886080+0 Datensätze aus
>> 42949672960 Bytes (43 GB) kopiert, 2338,15 s, 18,4 MB/s
>>
>>
>> dmesg shows the following (while dd'ing on node2):
>>
>> INFO: task flush-253:18:9820 blocked for more than 120 seconds.
>>        Not tainted 2.6.32-573.7.1.el6.x86_64 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> <snip>
>> any hint on fixing that?
> Every time I've seen this, it was because dlm was blocked. The most
> common cause of DLM blocking is a failed fence call. Do you have fencing
> configured *and* tested?
>
> If I were to guess, given the rather limited information you shared
> about your setup, the live migration consumed the network bandwidth,
> chocking out corosync traffic which caused the peer to be declared lost,
> called a fence which failed and left locking hung (which is by design;
> better to hang that risk corruption).
>
Hi,

fencing is configured and works.

I re-checked it by typing

echo c > /proc/sysrq-trigger

into node2 console.

The machine is fenced and comes back up. But the problem persists.