[ClusterLabs] gfs2 crashes when i, e.g., dd to a lvm volume

Thu Oct 8 14:20:29 UTC 2015

On 08/10/15 10:15 AM, Digimer wrote:
> On 08/10/15 07:50 AM, J. Echter wrote:
>> Hi,
>>
>> i have a strange issue on CentOS 6.5
>>
>> If i install a new vm on node1 it works well.
>>
>> If i install a new vm on node2 it gets stuck.
>>
>> Same if i do a dd if=/dev/zero of=/dev/DATEN/vm-test (on node2)
>>
>> On node1 it works:
>>
>> dd if=/dev/zero of=vm-test
>> Schreiben in „vm-test“: Auf dem Gerät ist kein Speicherplatz mehr verfügbar
>> 83886081+0 Datensätze ein
>> 83886080+0 Datensätze aus
>> 42949672960 Bytes (43 GB) kopiert, 2338,15 s, 18,4 MB/s
>>
>>
>> dmesg shows the following (while dd'ing on node2):
>>
>> INFO: task flush-253:18:9820 blocked for more than 120 seconds.
>>       Not tainted 2.6.32-573.7.1.el6.x86_64 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> <snip>
>>
>> any hint on fixing that?
> 
> Every time I've seen this, it was because dlm was blocked. The most
> common cause of DLM blocking is a failed fence call. Do you have fencing
> configured *and* tested?
> 
> If I were to guess, given the rather limited information you shared
> about your setup, the live migration consumed the network bandwidth,

s/live migration/disk load/  (if storage is on the same network as
corosync).

> chocking out corosync traffic which caused the peer to be declared lost,
> called a fence which failed and left locking hung (which is by design;
> better to hang that risk corruption).
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?