[PATCH] drm/sched: Fix a race in DRM_GPU_SCHED_STAT_NO_HANG test

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Wed Jul 16 13:41:49 UTC 2025


On 16/07/2025 13:47, Maíra Canal wrote:
> Hi Tvrtko,
> 
> On 16/07/25 05:48, Tvrtko Ursulin wrote:
>> The "skip reset" test waits for the timeout handler to run for the
>> duration of 2 * MOCK_TIMEOUT, and because the mock scheduler opted to
> 
> Would it make any sense to wait for 1.5 * MOCK_TIMEOUT? This way we
> would guarantee that only one timeout happened. I'm fine with the
> current solution as well.

1.5 * MOCK_TIMEOUT would work as well. I considered it, and even though 
I thought it would be safe, I concluded that it is better to have fewer 
dependencies on timings given these are two threads in this story.

Making the DRM_MOCK_SCHED_JOB_DONT_RESET persist allows for not having 
to think about timings. So slight preference to that. At least until 
some more advanced tests are attempted to be added.

>> remove the "skip reset" flag once it fires, this gives opportunity for 
>> the
>> timeout handler to run twice. Second time the job will be removed from 
>> the
>> mock scheduler job list and the drm_mock_sched_advance() call in the test
>> will fail.
>>
>> Fix it by making the "don't reset" flag persist for the lifetime of the
>> job and add a new flag to verify that the code path had executed as
>> expected.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Fixes: 1472e7549f84 ("drm/sched: Add new test for 
>> DRM_GPU_SCHED_STAT_NO_HANG")
>  > Cc: Maíra Canal <mcanal at igalia.com>> Cc: Philipp Stanner 
> <phasta at kernel.org>
>> ---
>>   drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 2 +-
>>   drivers/gpu/drm/scheduler/tests/sched_tests.h    | 7 ++++---
>>   drivers/gpu/drm/scheduler/tests/tests_basic.c    | 4 ++--
>>   3 files changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/ 
>> drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>> index 65acffc3fea8..8e9ae7d980eb 100644
>> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>> @@ -219,7 +219,7 @@ mock_sched_timedout_job(struct drm_sched_job 
>> *sched_job)
>>       unsigned long flags;
>>       if (job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET) {
>> -        job->flags &= ~DRM_MOCK_SCHED_JOB_DONT_RESET;
>> +        job->flags |= DRM_MOCK_SCHED_JOB_RESET_SKIPPED;
>>           return DRM_GPU_SCHED_STAT_NO_HANG;
>>       }
>> diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h b/drivers/ 
>> gpu/drm/scheduler/tests/sched_tests.h
>> index 63d4f2ac7074..5b262126b776 100644
>> --- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
>> +++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
>> @@ -95,9 +95,10 @@ struct drm_mock_sched_job {
>>       struct completion    done;
>> -#define DRM_MOCK_SCHED_JOB_DONE        0x1
>> -#define DRM_MOCK_SCHED_JOB_TIMEDOUT    0x2
>> -#define DRM_MOCK_SCHED_JOB_DONT_RESET    0x4
>> +#define DRM_MOCK_SCHED_JOB_DONE            0x1
>> +#define DRM_MOCK_SCHED_JOB_TIMEDOUT        0x2
>> +#define DRM_MOCK_SCHED_JOB_DONT_RESET        0x4
>> +#define DRM_MOCK_SCHED_JOB_RESET_SKIPPED    0x8
>>       unsigned long        flags;
>>       struct list_head    link;
>> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/ 
>> gpu/drm/scheduler/tests/tests_basic.c
>> index 55eb142bd7c5..82a41a456b0a 100644
>> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> @@ -317,8 +317,8 @@ static void drm_sched_skip_reset(struct kunit *test)
>>       KUNIT_ASSERT_FALSE(test, done);
>>       KUNIT_ASSERT_EQ(test,
>> -            job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET,
>> -            0);
>> +            job->flags & DRM_MOCK_SCHED_JOB_RESET_SKIPPED,
>> +            DRM_MOCK_SCHED_JOB_RESET_SKIPPED);
> 
> Maybe we could assert that job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT == 0

Could but I am not sure it is needed.

> Anyway, thanks for the fix!
> 
> Reviewed-by: Maíra Canal <mcanal at igalia.com>

Thank you!

Btw the failure for the record:

[09:07:20] # drm_sched_skip_reset: ASSERTION FAILED at 
drivers/gpu/drm/scheduler/tests/tests_basic.c:324
[09:07:20] Expected i == 1, but
[09:07:20]     i == 0 (0x0)
[09:07:20] [FAILED] drm_sched_skip_reset
[09:07:20]     # module: drm_sched_tests
[09:07:20] # drm_sched_basic_timeout_tests: pass:1 fail:1 skip:0 total:2
[09:07:20] # Totals: pass:1 fail:1 skip:0 total:2

Regards,

Tvrtko

>>       i = drm_mock_sched_advance(sched, 1);
>>       KUNIT_ASSERT_EQ(test, i, 1);
> 



More information about the dri-devel mailing list