[Bug 110848] New: [BXT/APL] Everything using GPU gets stuck after running parallel Media loads after 3D benchmarks
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu Jun 6 12:22:26 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=110848
Bug ID: 110848
Summary: [BXT/APL] Everything using GPU gets stuck after
running parallel Media loads after 3D benchmarks
Product: DRI
Version: DRI git
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: DRM/Intel
Assignee: intel-gfx-bugs at lists.freedesktop.org
Reporter: eero.t.tamminen at intel.com
QA Contact: intel-gfx-bugs at lists.freedesktop.org
CC: intel-gfx-bugs at lists.freedesktop.org
Setup 1:
* HW: BXT J4205
* OS: ClearLinux
* kernel: drm-tip compiled from git
* media: MediaSDK and its deps compiled from git (GitHub)
* FFmpeg: month old Git version: 2019-05-08 c636dc9819 "libavfilter/dnn: add
more data type support for dnn model input"
* GUI: Weston / Wayland / Mesa compiled from git
Setup 2 (differences from setup 1):
* OS: Ubuntu 18.04
* FFmpeg: latest git version
* GUI: Unity from Ubuntu with X server & MEsa compiled from git
Test-case:
1. Run 3D benchmarks
2. Do several runs of 50 parallel instances of following H264 transcode
operations:
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v h264_qsv -i
1280x720p_29.97_10mb_h264_cabac.264 -c:v h264_qsv -b:v 800K -vf
scale_qsv=w=352:h=240,fps=15 -compression_level 4 -frames 2400 -y output.h264
3. Do a single non-parallel run of above
Expected outcome:
* Everything works fine, like with month older 5.1 drm-tip kernel, or with GEN9
Core devices
Actual outcome:
* FFmpeg freezes at step 3
* No errors or warnings in dmesg
* Anything else trying to use GPU (even just glxinfo, clinfo) freezes
* Some processes not using GPU also freeze when started
There are no warnings or errors in dmesg.
First time I saw Media tests freezing was around 28th of May. So the
regression may have come between following drm-tip commits:
* 2019-05-27 14:42:27 e8f06c34fa: drm-tip: 2019y-05m-27d-14h-41m-23s UTC
integration manifest
* 2019-05-28 15:48:05 8991a80f85: drm-tip: 2019y-05m-28d-15h-47m-22s UTC
integration manifest
What Media case freezes, differs a bit e.g. based on FFmpeg version, and from
week to week. It's not always the same, but it's always single instance test
freezing after parallel tests have seemingly [1] finished. Both FFmpeg QSV and
MediaSDK sample application H264 transcode cases have frozen. Which one,
differs between setup 1 & setup 2.
If I run just Media cases after boot, I don't see the freeze, so there's some
interaction with what 3D benchmarks do.
[1] With last night drm-tip kernel, "ps" output for processes in D state, and
few other relevant process is following:
---------------------------------------------
...
38 ? DN 0:00 [khugepaged]
...
396 tty7 Ssl+ 5:08 /opt/install/bin/weston --tty=7 --idle-time=0
--xwayland
...
545 tty7 Dl+ 0:00 Xwayland :0 -rootless -listen 60 -listen 61 -wm 62
-terminate
...
8444 ? D 0:06 [ffmpeg]
8447 ? Zl 0:06 [ffmpeg] <defunct>
8448 ? Zl 0:06 [ffmpeg] <defunct>
8449 ? Zl 0:06 [ffmpeg] <defunct>
8451 ? Zl 0:06 [ffmpeg] <defunct>
8453 ? D 0:06 [ffmpeg]
8483 ? Zl 0:06 [ffmpeg] <defunct>
8497 ? Zl 0:06 [ffmpeg] <defunct>
8512 ? Zl 0:06 [ffmpeg] <defunct>
8525 ? Zl 0:06 [ffmpeg] <defunct>
8531 ? Zl 0:06 [ffmpeg] <defunct>
8546 ? Zl 0:06 [ffmpeg] <defunct>
8559 ? Zl 0:06 [ffmpeg] <defunct>
8574 ? Zl 0:06 [ffmpeg] <defunct>
8585 ? Zl 0:06 [ffmpeg] <defunct>
8603 ? Zl 0:06 [ffmpeg] <defunct>
8623 ? D 0:06 [ffmpeg]
8642 ? Zl 0:06 [ffmpeg] <defunct>
8650 ? D 0:06 [ffmpeg]
8678 ? Zl 0:06 [ffmpeg] <defunct>
8697 ? Zl 0:06 [ffmpeg] <defunct>
8704 ? D 0:06 [ffmpeg]
8711 ? D 0:06 [ffmpeg]
8723 ? D 0:06 [ffmpeg]
8733 ? D 0:06 [ffmpeg]
8756 ? Zl 143:22 [ffmpeg] <defunct>
8793 ? Zl 0:06 [ffmpeg] <defunct>
8822 ? Zl 0:06 [ffmpeg] <defunct>
8837 ? Zl 0:06 [ffmpeg] <defunct>
8845 ? D 0:06 [ffmpeg]
8851 ? Zl 0:06 [ffmpeg] <defunct>
8858 ? Zl 0:06 [ffmpeg] <defunct>
8871 ? Zl 0:06 [ffmpeg] <defunct>
8893 ? Zl 0:06 [ffmpeg] <defunct>
8942 ? Zl 0:06 [ffmpeg] <defunct>
8958 ? D 0:06 [ffmpeg]
8983 ? Zl 0:06 [ffmpeg] <defunct>
8991 ? D 0:06 [ffmpeg]
8999 ? Zl 0:06 [ffmpeg] <defunct>
9013 ? D 0:06 [ffmpeg]
9017 ? D 0:06 [ffmpeg]
9035 ? Zl 0:06 [ffmpeg] <defunct>
9058 ? Zl 0:06 [ffmpeg] <defunct>
9071 ? Zl 0:06 [ffmpeg] <defunct>
9117 ? Zl 0:06 [ffmpeg] <defunct>
9122 ? D 0:06 [ffmpeg]
9152 ? Zl 0:06 [ffmpeg] <defunct>
9165 ? Zl 0:06 [ffmpeg] <defunct>
9180 ? D 0:06 [ffmpeg]
9250 ? D 0:06 [ffmpeg]
9758 ? Ds 0:00 ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128
-c:v h264_qsv -i 1280x720p_29.97_10mb_h264_cabac.264 -c:v h264_qsv -b:v 800K
-vf scale_qsv=w=352:h=240,fps=15 -compression_level 4 -frames 2400 -y
output.h264
...
10184 ? D 0:00 top
...
---------------------------------------------
50 ffmpeg instances can use a bit of GEM objects (and FFmpeg QSV uses more of
them than MediaSDK alone, or FFmpeg VAAPI), which can put memory pressure on
the system, so kernel "khugepaged" thread being in D state is suspicious.
Other notes:
* I had also a process running in the background which uses ftrace uprobes to
track frame update functions from the 3D and Media processes, and tracking
"i915:intel_gpu_freq_change" events
* Killing that process after freeze, caused ssh to stop working, so it's
possible that there's some connection to ftrace
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190606/d8b46d54/attachment.html>
More information about the intel-gfx-bugs
mailing list