[ClusterLabs] gfs2 crashes when i, e.g., dd to a lvm volume
rpeterso at redhat.com
Thu Oct 8 10:34:33 EDT 2015
----- Original Message -----
> Am 08.10.2015 um 16:15 schrieb Digimer:
> > On 08/10/15 07:50 AM, J. Echter wrote:
> >> Hi,
> >> i have a strange issue on CentOS 6.5
> >> If i install a new vm on node1 it works well.
> >> If i install a new vm on node2 it gets stuck.
> >> Same if i do a dd if=/dev/zero of=/dev/DATEN/vm-test (on node2)
> >> On node1 it works:
> >> dd if=/dev/zero of=vm-test
> >> Schreiben in „vm-test“: Auf dem Gerät ist kein Speicherplatz mehr
> >> verfügbar
> >> 83886081+0 Datensätze ein
> >> 83886080+0 Datensätze aus
> >> 42949672960 Bytes (43 GB) kopiert, 2338,15 s, 18,4 MB/s
> >> dmesg shows the following (while dd'ing on node2):
> >> INFO: task flush-253:18:9820 blocked for more than 120 seconds.
> >> Not tainted 2.6.32-573.7.1.el6.x86_64 #1
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > <snip>
> >> any hint on fixing that?
> > Every time I've seen this, it was because dlm was blocked. The most
> > common cause of DLM blocking is a failed fence call. Do you have fencing
> > configured *and* tested?
> > If I were to guess, given the rather limited information you shared
> > about your setup, the live migration consumed the network bandwidth,
> > chocking out corosync traffic which caused the peer to be declared lost,
> > called a fence which failed and left locking hung (which is by design;
> > better to hang that risk corruption).
> fencing is configured and works.
> I re-checked it by typing
> echo c > /proc/sysrq-trigger
> into node2 console.
> The machine is fenced and comes back up. But the problem persists.
Can you send any more information about the crash? What makes you think
it's gfs2 and not some other kernel component? Do you get any messages
on the console? If not, perhaps you can temporarily disable or delay fencing
long enough to get console messages.
Red Hat File Systems
More information about the Users