Thu Oct 8 10:15:33 EDT 2015

On 08/10/15 07:50 AM, J. Echter wrote:
> Hi,
> i have a strange issue on CentOS 6.5
> If i install a new vm on node1 it works well.
> If i install a new vm on node2 it gets stuck.
> Same if i do a dd if=/dev/zero of=/dev/DATEN/vm-test (on node2)
> On node1 it works:
> dd if=/dev/zero of=vm-test
> Schreiben in „vm-test“: Auf dem Gerät ist kein Speicherplatz mehr verfügbar
> 83886081+0 Datensätze ein
> 83886080+0 Datensätze aus
> 42949672960 Bytes (43 GB) kopiert, 2338,15 s, 18,4 MB/s
> dmesg shows the following (while dd'ing on node2):
> INFO: task flush-253:18:9820 blocked for more than 120 seconds.
>       Not tainted 2.6.32-573.7.1.el6.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> any hint on fixing that?

Every time I've seen this, it was because dlm was blocked. The most
common cause of DLM blocking is a failed fence call. Do you have fencing
configured *and* tested?

If I were to guess, given the rather limited information you shared
about your setup, the live migration consumed the network bandwidth,
chocking out corosync traffic which caused the peer to be declared lost,
called a fence which failed and left locking hung (which is by design;
better to hang that risk corruption).

