[Intel-gfx] [igt-dev] [PATCH i-g-t 0/2] tests/i915/perf: Add stress / race exercises

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Tue Jan 31 17:36:30 UTC 2023


On Tuesday, 31 January 2023 17:19:48 CET Dixit, Ashutosh wrote:
> On Tue, 31 Jan 2023 01:17:29 -0800, Janusz Krzysztofik wrote:
> >
> 
> Hi Janusz,
> 
> > Users reported oopses on list corruptions when using i915 perf with a
> > number of concurrently running graphics applications.  That indicates we
> > are currently missing some important tests for such scenarios.  Cover
> > that gap.
> 
> Do these oops etc. have anything to do with perf itself or rather with
> persistence or non-persistence not properly supported with GuC? 

My root cause analysis has revealed that these list corruptions are actually 
caused by a bug in barrier processing, then no, they are not persistence nor 
GuC related.  For details, please see my preliminary (still a bit buggy, but 
otherwise valid) fix, so far sent only to trybot:
https://patchwork.freedesktop.org/series/113268/

> We should
> have seen such failures with persistence tests (with GuC) itself so I am
> wondering if there's any point of dragging perf into these already muddy
> waters? Such failures should be isolated first with other tests without
> mixing perf into this IMO.

I see your point, but unfortunately things are not that easy.  My 
investigation has lead me to a conclusion that the bug within the barrier 
processing code is now addressed, to some extent, and probably not 
intentionally, by a kind of workaround that makes it really hard to reproduce 
without any interaction from an external user that tries to replace a barrier 
with its own request.  And I can see a very limited number of such users, one 
of them being perf.

The first patch was developed by Chris still before I found the the root cause 
of the issue.  Since the bug seemed strictly perf related at that point in 
time, that's probably why Chris decided to add the new subtest to perf.  As 
such, that subtest is more general than just focused on triggering the list 
corruption bug, and it pretty belongs to perf, I believe.

Since Chris' subtest didn't help in triggering the list corruption, I've 
developed a new subtest that can do it.  Since it is almost identical to the 
one Chris added, I decided to reuse his code, then add my new subtest to perf 
as well.  But maybe you are right that my subtest better fits to another test. 
not perf.  I'll think this over.

I hope this clarifies things for you.

Thanks,
Janusz

> 
> Thanks.
> --
> Ashutosh
> 






More information about the Intel-gfx mailing list