[PATCH i-g-t v2] tests/intel/gem_exec_capture: Fix many-* subtests
Peter Senna Tschudin
me at petersenna.com
Tue Apr 23 04:15:32 UTC 2024
Hi Kamil,
On Fri, Apr 19, 2024 at 5:05 PM Kamil Konieczny <
kamil.konieczny at linux.intel.com> wrote:
> Hi Peter,
> On 2024-04-11 at 21:53:34 +0200, Peter Senna Tschudin wrote:
> > Currently trying to run `gem_exec_capture --run-subtest
> > many-4K-incremental` or `gem_exec_capture --run-subtest many-4K-zero`
> > will fail with:
> >
> > (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: Test
> > assertion failure function gem_engine_properties_configure, file
> > ../lib/i915/gem_engine_topology.c:577:
> > (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: Failed
> assertion: ret == 1
> > (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: Last errno:
> 9, Bad file descriptor
> > (gem_exec_capture:81999) i915/gem_engine_topology-CRITICAL: error: -1
> != 1
> >
> > This problem happens inside the macro find_first_available_engine()
> > when:
> > 1. for_each_ctx_engine() allocates an struct intel_engine_data 'ed'
> > inside a for loop. The core of the issue is that ed only exists
> > inside the for loop. As soon as the for loop ends, ed is out of scope
> > and after it's lifetime.
> > 2. intel_get_current_engine() sets '*e' to an address of ed. This is ok
> > while inside the for loop, and is undefined behavior after the for
> > loop ends.
> > 3. configure_hangs() uses '*e' after the lifetime of 'ed' has ended
> > leading to undefined behavior
> > 4. After the call to find_first_available_engine() __captureN() will
> > fail as it expects '*e' to be valid. This is also undefined
> > behavior.
> >
> > Additionally, this patch sets 'e' to NULL early on at the macro, and at
> > the end check if it is still NULL. If so it will skip the test to
> > prevent downstream code from crashing.
>
> Strictly speaking we are not preventing crash but catch a case
> when test cannot proceed (as it needs capable engine),
> so then we skip it. Btw I plan on re-testing it on Monday and
> merge it next week with that small description correction.
>
I guess that we need to reduce the test size to avoid the CI timeout we see
after my fix. I made a patch for it:
https://patchwork.freedesktop.org/series/131845/
Thank you,
Peter
[...]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/igt-dev/attachments/20240423/5f2fd095/attachment.htm>
More information about the igt-dev
mailing list