[Bug 108569] [CI][BAT icl] igt at drv_selftest@live_contexts - dmesg-fail - igt_ctx_readonly failed with error -5 HSDES#:1807136187

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Apr 9 12:09:17 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=108569

--- Comment #19 from Lakshmi <lakshminarayana.vudum at intel.com> ---
Impact to users
---------------

Since BDW the HW has had support for read-only pages in the PPGGT. This is used
internally by the driver to share scratch pages between objects, saving memory,
and is exposed to UMDs via the UserPtr API.

Before ICL, writing to a read-only page was silently dropped. In ICL, it hangs
the GPU. There are a few ways this can affect users:

1) there is a bug in either driver or userspace that mistakenly tries to write
to a read-only page, we'll have a hang.

2) userspace relies on the pre-ICL behavior and decides to write to read-only
page assuming nothing will happen (like the test in question does), getting a
hang instead.

Considering that this bug has been present for months, and hasn't been reported
by UMDs, it is reasonable to assume that the impact of 1 and 2 is low. OCL team
confirmed they don't use this feature, and we are not aware of media and Mesa
doing it, although it needs to be confirmed. This is however something we
should fix or prevent to avoid surprises.


Way forward
-----------

As a workaround, it is possible to disable read-only page support for ICL in
the driver. We will lose the ability to share scratch pages between objects,
requiring a 64k page allocation every time causing memory waste and
fragmentation. We will also be sporadically unable to use hugepages in the GPU
and will need to handle the userspace, which is possible but likely to
introduce new bugs (more details should be asked from Wilson, Chris P whose
explanation I'm paraphrasing). 

Implementing the above workaround is a matter of a few days plus leaving enough
time to get some extensive CI runs. However, we are investigating other options
first.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190409/4e642d6e/attachment.html>


More information about the intel-gfx-bugs mailing list