[PATCH] drm/sched: Fix a race in DRM_GPU_SCHED_STAT_NO_HANG test

Philipp Stanner phasta at mailbox.org
Wed Jul 16 14:14:50 UTC 2025


On Wed, 2025-07-16 at 11:05 -0300, Maíra Canal wrote:
> Hi Tvrtko,
> 
> On 16/07/25 10:41, Tvrtko Ursulin wrote:
> > 
> > On 16/07/2025 13:47, Maíra Canal wrote:
> > > Hi Tvrtko,
> > > 
> > > On 16/07/25 05:48, Tvrtko Ursulin wrote:
> > > > The "skip reset" test waits for the timeout handler to run for
> > > > the
> > > > duration of 2 * MOCK_TIMEOUT, and because the mock scheduler
> > > > opted to
> > > 
> > > Would it make any sense to wait for 1.5 * MOCK_TIMEOUT? This way
> > > we
> > > would guarantee that only one timeout happened. I'm fine with the
> > > current solution as well.
> > 
> > 1.5 * MOCK_TIMEOUT would work as well. I considered it, and even
> > though 
> > I thought it would be safe, I concluded that it is better to have
> > fewer 
> > dependencies on timings given these are two threads in this story.
> 
> Why not both? Just to make sure we won't run the timedout function
> twice, but still fixing the timing dependency by using
> DRM_MOCK_SCHED_JOB_RESET_SKIPPED.

I agree with Tvrtko that his approach with the new status is more
robust.

That said, I don't have a strong opinion about reducing the timeout
limit. But my feeling is that it running into the timeout (potentially)
twice is actually a good thing, because it could potentially catch more
issues if there are races etc. in the future.


P.

> 
> > 
> > Making the DRM_MOCK_SCHED_JOB_DONT_RESET persist allows for not
> > having 
> > to think about timings. So slight preference to that. At least
> > until 
> > some more advanced tests are attempted to be added.
> > 
> > > > remove the "skip reset" flag once it fires, this gives
> > > > opportunity 
> > > > for the
> > > > timeout handler to run twice. Second time the job will be
> > > > removed 
> > > > from the
> > > > mock scheduler job list and the drm_mock_sched_advance() call
> > > > in the 
> > > > test
> > > > will fail.
> > > > 
> > > > Fix it by making the "don't reset" flag persist for the
> > > > lifetime of the
> > > > job and add a new flag to verify that the code path had
> > > > executed as
> > > > expected.
> > > > 
> > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > > > Fixes: 1472e7549f84 ("drm/sched: Add new test for 
> > > > DRM_GPU_SCHED_STAT_NO_HANG")
> > >  > Cc: Maíra Canal <mcanal at igalia.com>> Cc: Philipp Stanner 
> > > <phasta at kernel.org>
> > > > ---
> > > >   drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 2 +-
> > > >   drivers/gpu/drm/scheduler/tests/sched_tests.h    | 7 ++++---
> > > >   drivers/gpu/drm/scheduler/tests/tests_basic.c    | 4 ++--
> > > >   3 files changed, 7 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> > > > b/ 
> > > > drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> > > > index 65acffc3fea8..8e9ae7d980eb 100644
> > > > --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> > > > +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> > > > @@ -219,7 +219,7 @@ mock_sched_timedout_job(struct
> > > > drm_sched_job 
> > > > *sched_job)
> > > >       unsigned long flags;
> > > >       if (job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET) {
> > > > -        job->flags &= ~DRM_MOCK_SCHED_JOB_DONT_RESET;
> > > > +        job->flags |= DRM_MOCK_SCHED_JOB_RESET_SKIPPED;
> > > >           return DRM_GPU_SCHED_STAT_NO_HANG;
> > > >       }
> > > > diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h
> > > > b/drivers/ 
> > > > gpu/drm/scheduler/tests/sched_tests.h
> > > > index 63d4f2ac7074..5b262126b776 100644
> > > > --- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
> > > > +++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
> > > > @@ -95,9 +95,10 @@ struct drm_mock_sched_job {
> > > >       struct completion    done;
> > > > -#define DRM_MOCK_SCHED_JOB_DONE        0x1
> > > > -#define DRM_MOCK_SCHED_JOB_TIMEDOUT    0x2
> > > > -#define DRM_MOCK_SCHED_JOB_DONT_RESET    0x4
> > > > +#define DRM_MOCK_SCHED_JOB_DONE            0x1
> > > > +#define DRM_MOCK_SCHED_JOB_TIMEDOUT        0x2
> > > > +#define DRM_MOCK_SCHED_JOB_DONT_RESET        0x4
> > > > +#define DRM_MOCK_SCHED_JOB_RESET_SKIPPED    0x8
> > > >       unsigned long        flags;
> > > >       struct list_head    link;
> > > > diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> > > > b/drivers/ 
> > > > gpu/drm/scheduler/tests/tests_basic.c
> > > > index 55eb142bd7c5..82a41a456b0a 100644
> > > > --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> > > > +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> > > > @@ -317,8 +317,8 @@ static void drm_sched_skip_reset(struct
> > > > kunit *test)
> > > >       KUNIT_ASSERT_FALSE(test, done);
> > > >       KUNIT_ASSERT_EQ(test,
> > > > -            job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET,
> > > > -            0);
> > > > +            job->flags & DRM_MOCK_SCHED_JOB_RESET_SKIPPED,
> > > > +            DRM_MOCK_SCHED_JOB_RESET_SKIPPED);
> > > 
> > > Maybe we could assert that job->flags &
> > > DRM_MOCK_SCHED_JOB_TIMEDOUT == 0
> > 
> > Could but I am not sure it is needed.
> 
> Np.
> 
> > 
> > > Anyway, thanks for the fix!
> > > 
> > > Reviewed-by: Maíra Canal <mcanal at igalia.com>
> > 
> > Thank you!
> > 
> > Btw the failure for the record:
> > 
> > [09:07:20] # drm_sched_skip_reset: ASSERTION FAILED at
> > drivers/gpu/drm/ 
> > scheduler/tests/tests_basic.c:324
> > [09:07:20] Expected i == 1, but
> > [09:07:20]     i == 0 (0x0)
> > [09:07:20] [FAILED] drm_sched_skip_reset
> > [09:07:20]     # module: drm_sched_tests
> > [09:07:20] # drm_sched_basic_timeout_tests: pass:1 fail:1 skip:0
> > total:2
> > [09:07:20] # Totals: pass:1 fail:1 skip:0 total:2
> 
> Unfortunately, I didn't get this error during my test run. On the
> other
> hand, I only ran it once before pushing the series, so that's on me.
> Thanks for catching it!
> 
> Best Regards,
> - Maíra
> 
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > >       i = drm_mock_sched_advance(sched, 1);
> > > >       KUNIT_ASSERT_EQ(test, i, 1);
> > > 
> > 
> 



More information about the dri-devel mailing list