[Intel-gfx] [igt-dev] [PATCH i-g-t 0/2] tests/i915/perf: Add stress / race exercises
Janusz Krzysztofik
janusz.krzysztofik at linux.intel.com
Tue Jan 31 17:36:30 UTC 2023
On Tuesday, 31 January 2023 17:19:48 CET Dixit, Ashutosh wrote:
> On Tue, 31 Jan 2023 01:17:29 -0800, Janusz Krzysztofik wrote:
> >
>
> Hi Janusz,
>
> > Users reported oopses on list corruptions when using i915 perf with a
> > number of concurrently running graphics applications. That indicates we
> > are currently missing some important tests for such scenarios. Cover
> > that gap.
>
> Do these oops etc. have anything to do with perf itself or rather with
> persistence or non-persistence not properly supported with GuC?
My root cause analysis has revealed that these list corruptions are actually
caused by a bug in barrier processing, then no, they are not persistence nor
GuC related. For details, please see my preliminary (still a bit buggy, but
otherwise valid) fix, so far sent only to trybot:
https://patchwork.freedesktop.org/series/113268/
> We should
> have seen such failures with persistence tests (with GuC) itself so I am
> wondering if there's any point of dragging perf into these already muddy
> waters? Such failures should be isolated first with other tests without
> mixing perf into this IMO.
I see your point, but unfortunately things are not that easy. My
investigation has lead me to a conclusion that the bug within the barrier
processing code is now addressed, to some extent, and probably not
intentionally, by a kind of workaround that makes it really hard to reproduce
without any interaction from an external user that tries to replace a barrier
with its own request. And I can see a very limited number of such users, one
of them being perf.
The first patch was developed by Chris still before I found the the root cause
of the issue. Since the bug seemed strictly perf related at that point in
time, that's probably why Chris decided to add the new subtest to perf. As
such, that subtest is more general than just focused on triggering the list
corruption bug, and it pretty belongs to perf, I believe.
Since Chris' subtest didn't help in triggering the list corruption, I've
developed a new subtest that can do it. Since it is almost identical to the
one Chris added, I decided to reuse his code, then add my new subtest to perf
as well. But maybe you are right that my subtest better fits to another test.
not perf. I'll think this over.
I hope this clarifies things for you.
Thanks,
Janusz
>
> Thanks.
> --
> Ashutosh
>
More information about the Intel-gfx
mailing list