[Pkg-xen-devel] Bug#944247: xen domU crashes under high i/o load if you use qcow2 images

mario info at munex.de
Wed Nov 6 15:46:05 GMT 2019


Source: xen
Severity: important

Dear Maintainer,

we have updated our server from debian oldstable (which unfortunately wasn't running stable after the last update, bug reported) to debian buster.

unfortunately xen doesn't work reliably there either:

the virtual server crashes every 1-2 week with i/o problems and sometimes also takes other domU instances with it.
we use qcow2 images.

the harddisk of the domU is simply no longer accessible for the linux kernel, no logfiles are available. in the xl console the following last lines can be read, login not possible:

[ 1450.976415] INFO: task nginx:376 blocked for more than 120 seconds.
[ 1450.976423] Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u5
[ 1450.976428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.976469] INFO: task nginx:377 blocked for more than 120 seconds.
[ 1450.976474] Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u5
[ 1450.976479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.976624] INFO: task nginx:378 blocked for more than 120 seconds.

the process varies:
[1523692.508073] INFO: task jbd2/xvda2-8:159 blocked for more than 120 seconds
[1523692.508084] Not tainted [...]

all hard disk accesses fail as if the i/o system is completely dead.
only "xl destroy <domid>" and recreate will help

you can easily reproduce this with the tool stress "stress -c 8 -i 8 -d 8".
it takes a maximum of 10 minutes until the vm crashes.

in our experience, as a workaround you can convert all images to raw. after our tests, the error will no longer occur. 
but since we need the snapshot functions of qcow2 images, this is not a permanent solution.

does anyone else have problems with qcow2 images and xen under buster?
maybe this also concerns qemu?

xen vm.cfg:

we using pygrub, 4 vcpus, 1024 memory

#
# Disk device(s).
#
root = '/dev/xvda2 ro'
disk = [
                  'tap:qcow2:/[...]/disk.qcow2,xvda2,w',
                  file:/[...]/swap.img,xvda1,w',
              ]

-- System Information:
Debian Release: 10.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-6-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled



More information about the Pkg-xen-devel mailing list