[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Set PROBE_PREFER_ASYNCHRONOUS
Brian Norris
briannorris at chromium.org
Thu Nov 3 00:14:25 UTC 2022
On Wed, Nov 02, 2022 at 12:18:37PM +0000, Matthew Auld wrote:
> On Tue, 1 Nov 2022 at 21:58, Brian Norris <briannorris at chromium.org> wrote:
> >
> > On Fri, Oct 28, 2022 at 5:24 PM Patchwork
> > <patchwork at emeril.freedesktop.org> wrote:
> > >
> > > Patch Details
> > > Series:drm/i915: Set PROBE_PREFER_ASYNCHRONOUS
> > > URL:https://patchwork.freedesktop.org/series/110277/
> > > State:failure
> > > Details:https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> > >
> > > CI Bug Log - changes from CI_DRM_12317 -> Patchwork_110277v1
> > >
> > > Summary
> > >
> > > FAILURE
> > >
> > > Serious unknown changes coming with Patchwork_110277v1 absolutely need to be
> > > verified manually.
> > >
> > > If you think the reported changes have nothing to do with the changes
> > > introduced in Patchwork_110277v1, please notify your bug team to allow them
> > > to document this new failure mode, which will reduce false positives in CI.
> > >
> > > External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> >
> > For the record, I have almost zero idea what to do with this. From
> > what I can tell, most (all?) of these failures are flaky(?) already
> > and are probably not related to my change.
>
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
>
> According to that link, this change appears to break every platform
> when running the live selftests (looking at the purple squares).
> Running the selftests normally involves loading and unloading the
> module. Looking at the logs there is scary stuff like:
>
[...]
Ah, thanks. I'm not sure what made me think the tests were failing the
same way on drm-tip, but maybe just chalk that up to my unfamiliarity
with this particular dashboard... (There are a few isolated failure
and/or flakes on drm-tip, but they don't look like this.)
Anyway, I think I managed to run some of these tests on my own platforms
[1], and I don't reproduce those failures. I do see other failures
(crashes) though, like in i915_gem_mman_live_selftests/igt_mmap, where
igt_mmap_offset() (selftest-only code) -> vm_mmap() assumes we have a
valid |current->mm|. But that's borrowing the modprobe process's memory
map, and with async probe, the selftest sequence happens in a kernel
worker instead (and current->mm is NULL). So that clearly won't work.
I suppose I could disable async probe when built as a module (I believe
it doesn't really have any value, since the module load task just waits
for the async task anyway). I'm not familiar enough with MM to know what
the vm_mmap() alternatives are, but this particular bit of code does
feel odd.
Additionally, I think this implies that live_selftests will break if
i915 is built-in (i.e., =y, not =m), as we'll again run in a
kernel-thread context at boot time. But I would hope nobody is trying to
run them that way? I guess this gets even hairier, because even if the
driver is built into the kernel, it's possible to kick them off from a
process context by tweaking the module parameters later, and then
re-binding the device... So all in all, this bug leaves an ugly
situation, with or without my patch.
I'm still curious about the reported failures, but maybe they require
some particular sequence of tests? I also don't have the full
igt-gpu-tools set running, so maybe they do something a little
differently than my steps in [1]?
Brian
[1] I have a GLk system, if it matters. I figured I can run some of
these with any one of the following:
modprobe i915 live_selftests=1
modprobe i915 live_selftests=1 igt__20__live_workarounds=Y
modprobe i915 live_selftests=1 igt__19__live_uncore=Y
modprobe i915 live_selftests=1 igt__18__live_sanitycheck=Y
...
More information about the Intel-gfx
mailing list