[PATCH i-g-t v2] Fix memory access issue due to variable block scope

Peter Senna Tschudin me at petersenna.com
Tue Mar 26 21:41:27 UTC 2024


Hi Kamil,

Thank you for your email!

On Tue, Mar 26, 2024 at 7:20 PM Kamil Konieczny
<kamil.konieczny at linux.intel.com> wrote:
>
> Hi peter.senna,
> On 2024-03-26 at 14:18:43 +0100, Peter Senna Tschudin wrote:
> > Hi Andi,
> >
> > Thank you for your reply.
> >
> > On Tue, Mar 26, 2024 at 1:05 PM Andi Shyti <andi.shyti at linux.intel.com> wrote:
> > >
> > > Hey Peter,
> > >
> > > On Mon, Mar 25, 2024 at 10:35:48PM +0100, Peter Senna Tschudin wrote:
> > > > This patch fixes the tests gem_exec_capture at many-4k-incremental and
> > > > gem_exec_capture at many-4k-zero that are currently failing with an invalid file
> > > > descriptor error.
> > >
> > > where is gem_exec_capture calling for_each_ctx_cfg_engine()?
> >
> > many(), userptr(), capture_invisible()
> >     find_first_available_engine()
> >         for_each_ctx_engine()
> >
> > When called by many(), 'e' gets corrupted when configure_hangs() tries
> > to assign 'e' to another variable. Then after 'e' is corrupted, the
> > call __captureN() will fail because it expects 'e' to be valid.
> >
>
> imho first step whould be to replace macro with a function:
>
> saved = find_first_available_engine(fd, ctx, e);

When I was done with the replacement function, I found a simple
solution that seems to be adequate. I made a small change to
find_first_available_engine() to reassign e to an element of saved. My
patch v3 has the details. What do you think?

May I ask you to increase the time the CI will waits for the tests to complete:
 - many-4k-zero: 2 minutes
 - many-4k-incremental: 5 minutes (It takes 4 minutes on my N3150)

Thank you,

Peter

>
> Regards,
> Kamil
>
> > >
> > > >  struct intel_execution_engine2 *
> > > >  intel_get_current_engine(struct intel_engine_data *ed)
> > > >
> > > > When intel_get_current_engine is called from the macro
> > > > for_each_ctx_cfg_engine(), the variable *ed is defined within a for loop. The
> > > > scope of *ed is limited to that loop, leading to access violations when
> > > > attempting to access its contents outside the loop.
> > > >
> > > > Before to this patch, intel_get_current_engine() would return an element of *ed
> > > > and attempting to use it after the loop ended resulted in undefined behavior.
> > > >
> > > > This patch introduces a memcpy() to copy the contents of ed->current_engine to
> > > > a memory area not confined by the loop's scope, ensuring safe access to the
> > > > data.
> > > >
> > > > v2: Added 'i-g-t' to the Subject.
> > > >
> > > > Signed-off-by: Peter Senna Tschudin <peter.senna at gmail.com>
> > > > ---
> > > >  lib/i915/gem_engine_topology.c | 14 +++++++++++++-
> > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/lib/i915/gem_engine_topology.c b/lib/i915/gem_engine_topology.c
> > > > index afb576afb..b3b809482 100644
> > > > --- a/lib/i915/gem_engine_topology.c
> > > > +++ b/lib/i915/gem_engine_topology.c
> > > > @@ -189,12 +189,24 @@ static int __query_engine_list(int fd, struct intel_engine_data *ed)
> > > >  struct intel_execution_engine2 *
> > > >  intel_get_current_engine(struct intel_engine_data *ed)
> > > >  {
> > > > +     struct intel_execution_engine2 *ret = NULL;
> > > > +
> > > >       if (ed->n >= ed->nengines)
> > > >               ed->current_engine = NULL;
> > > >       else if (!ed->n)
> > > >               ed->current_engine = &ed->engines[0];
> > > >
> > > > -     return ed->current_engine;
> > > > +     // When called from the macro for_each_ctx_cfg_engine(), *ed is defined
> > > > +     // inside a for loop. In that case, not memcping ed->current_engine
> > > > +     // will lead to a memory access violation when trying to access the
> > > > +     // contents of ed->current_engine after the end of the for loop
> > >
> > > can you please use /* ... */ style of comment?
> >
> > sure, thank you for pointing this out.
> >
> > >
> > > > +     if (ed->current_engine) {
> > > > +             ret = malloc(sizeof(*ret));
> > >
> > > should this be freed at some point?
> >
> > I am not convinced that his patch is the best approach. If the problem
> > is indeed the block scope of '*ed', I will propose to give
> > for_each_ctx_engine() some serious love. I tried to come up with a
> > proper solution but I failed to find one. The problems I faced are:
> >  - for_each_ctx_engine() requires an struct intel_execution_engine2 as
> > an iterator
> >  - I did not find any way to keep for_each_ctx_engine() as a macro and
> > change the scope(other than declaring ed outside the macro, argh...).
> >  - Because for_each_ctx_engine() is a macro, I could not find a
> > compiler friendly way to define the struct intel_execution_engine2
> > within the macro
> >  - To use free() 'properly', we need to save the information of when
> > 'e' was allocated to prevent the code from trying to free something
> > that should not be freed.
> >
> > This is why I asked for help here* on how you want me to fix it: I am
> > under the impression that there is not a good solution for this
> > problem. And then it becomes a question of long term maintenance.
> > Should we drop the macro? Forbid the use of 'e' after the macro ends?
> > Properly trace memory allocation of 'e' to prevent the potential
> > free() issue? Something else? Please let me know.
> >
> > * - https://lists.freedesktop.org/archives/igt-dev/2024-March/070468.html
> >
> >
> >
> > >
> > > Andi


More information about the igt-dev mailing list