[PATCH i-g-t v2] tests/intel/gem_exec_capture: Fix many-* subtests

Peter Senna Tschudin me at petersenna.com
Tue Apr 23 04:15:32 UTC 2024


Hi Kamil,

On Fri, Apr 19, 2024 at 5:05 PM Kamil Konieczny <
kamil.konieczny at linux.intel.com> wrote:

> Hi Peter,
> On 2024-04-11 at 21:53:34 +0200, Peter Senna Tschudin wrote:
> > Currently trying to run `gem_exec_capture --run-subtest
> > many-4K-incremental` or `gem_exec_capture --run-subtest many-4K-zero`
> > will fail with:
> >
> >  (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: Test
> >  assertion failure function gem_engine_properties_configure, file
> >  ../lib/i915/gem_engine_topology.c:577:
> >  (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: Failed
> assertion: ret == 1
> >  (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: Last errno:
> 9, Bad file descriptor
> >  (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: error: -1
> != 1
> >
> > This problem happens inside the macro find_first_available_engine()
> > when:
> >  1. for_each_ctx_engine() allocates an struct intel_engine_data 'ed'
> >     inside a for loop. The core of the issue is that ed only exists
> >     inside the for loop. As soon as the for loop ends, ed is out of scope
> >     and after it's lifetime.
> >  2. intel_get_current_engine() sets '*e' to an address of ed. This is ok
> >     while inside the for loop, and is undefined behavior after the for
> >     loop ends.
> >  3. configure_hangs() uses '*e' after the lifetime of 'ed' has ended
> >     leading to undefined behavior
> >  4. After the call to find_first_available_engine() __captureN() will
> >     fail as it expects '*e' to be valid. This is also undefined
> >     behavior.
> >
> > Additionally, this patch sets 'e' to NULL early on at the macro, and at
> > the end check if it is still NULL. If so it will skip the test to
> > prevent downstream code from crashing.
>
> Strictly speaking we are not preventing crash but catch a case
> when test cannot proceed (as it needs capable engine),
> so then we skip it. Btw I plan on re-testing it on Monday and
> merge it next week with that small description correction.
>

I guess that we need to reduce the test size to avoid the CI timeout we see
after my fix. I made a patch for it:

https://patchwork.freedesktop.org/series/131845/

Thank you,

Peter

[...]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/igt-dev/attachments/20240423/5f2fd095/attachment.htm>


More information about the igt-dev mailing list