[PATCH i-g-t v2] Fix memory access issue due to variable block scope

Peter Senna Tschudin me at petersenna.com
Tue Mar 26 13:18:43 UTC 2024


Hi Andi,

Thank you for your reply.

On Tue, Mar 26, 2024 at 1:05 PM Andi Shyti <andi.shyti at linux.intel.com> wrote:
>
> Hey Peter,
>
> On Mon, Mar 25, 2024 at 10:35:48PM +0100, Peter Senna Tschudin wrote:
> > This patch fixes the tests gem_exec_capture at many-4k-incremental and
> > gem_exec_capture at many-4k-zero that are currently failing with an invalid file
> > descriptor error.
>
> where is gem_exec_capture calling for_each_ctx_cfg_engine()?

many(), userptr(), capture_invisible()
    find_first_available_engine()
        for_each_ctx_engine()

When called by many(), 'e' gets corrupted when configure_hangs() tries
to assign 'e' to another variable. Then after 'e' is corrupted, the
call __captureN() will fail because it expects 'e' to be valid.

>
> >  struct intel_execution_engine2 *
> >  intel_get_current_engine(struct intel_engine_data *ed)
> >
> > When intel_get_current_engine is called from the macro
> > for_each_ctx_cfg_engine(), the variable *ed is defined within a for loop. The
> > scope of *ed is limited to that loop, leading to access violations when
> > attempting to access its contents outside the loop.
> >
> > Before to this patch, intel_get_current_engine() would return an element of *ed
> > and attempting to use it after the loop ended resulted in undefined behavior.
> >
> > This patch introduces a memcpy() to copy the contents of ed->current_engine to
> > a memory area not confined by the loop's scope, ensuring safe access to the
> > data.
> >
> > v2: Added 'i-g-t' to the Subject.
> >
> > Signed-off-by: Peter Senna Tschudin <peter.senna at gmail.com>
> > ---
> >  lib/i915/gem_engine_topology.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/i915/gem_engine_topology.c b/lib/i915/gem_engine_topology.c
> > index afb576afb..b3b809482 100644
> > --- a/lib/i915/gem_engine_topology.c
> > +++ b/lib/i915/gem_engine_topology.c
> > @@ -189,12 +189,24 @@ static int __query_engine_list(int fd, struct intel_engine_data *ed)
> >  struct intel_execution_engine2 *
> >  intel_get_current_engine(struct intel_engine_data *ed)
> >  {
> > +     struct intel_execution_engine2 *ret = NULL;
> > +
> >       if (ed->n >= ed->nengines)
> >               ed->current_engine = NULL;
> >       else if (!ed->n)
> >               ed->current_engine = &ed->engines[0];
> >
> > -     return ed->current_engine;
> > +     // When called from the macro for_each_ctx_cfg_engine(), *ed is defined
> > +     // inside a for loop. In that case, not memcping ed->current_engine
> > +     // will lead to a memory access violation when trying to access the
> > +     // contents of ed->current_engine after the end of the for loop
>
> can you please use /* ... */ style of comment?

sure, thank you for pointing this out.

>
> > +     if (ed->current_engine) {
> > +             ret = malloc(sizeof(*ret));
>
> should this be freed at some point?

I am not convinced that his patch is the best approach. If the problem
is indeed the block scope of '*ed', I will propose to give
for_each_ctx_engine() some serious love. I tried to come up with a
proper solution but I failed to find one. The problems I faced are:
 - for_each_ctx_engine() requires an struct intel_execution_engine2 as
an iterator
 - I did not find any way to keep for_each_ctx_engine() as a macro and
change the scope(other than declaring ed outside the macro, argh...).
 - Because for_each_ctx_engine() is a macro, I could not find a
compiler friendly way to define the struct intel_execution_engine2
within the macro
 - To use free() 'properly', we need to save the information of when
'e' was allocated to prevent the code from trying to free something
that should not be freed.

This is why I asked for help here* on how you want me to fix it: I am
under the impression that there is not a good solution for this
problem. And then it becomes a question of long term maintenance.
Should we drop the macro? Forbid the use of 'e' after the macro ends?
Properly trace memory allocation of 'e' to prevent the potential
free() issue? Something else? Please let me know.

* - https://lists.freedesktop.org/archives/igt-dev/2024-March/070468.html



>
> Andi


More information about the igt-dev mailing list