[PATCH] drm/sched: Fix a race in DRM_GPU_SCHED_STAT_NO_HANG test

Maíra Canal mcanal at igalia.com
Wed Jul 16 12:47:24 UTC 2025


Hi Tvrtko,

On 16/07/25 05:48, Tvrtko Ursulin wrote:
> The "skip reset" test waits for the timeout handler to run for the
> duration of 2 * MOCK_TIMEOUT, and because the mock scheduler opted to

Would it make any sense to wait for 1.5 * MOCK_TIMEOUT? This way we
would guarantee that only one timeout happened. I'm fine with the
current solution as well.

> remove the "skip reset" flag once it fires, this gives opportunity for the
> timeout handler to run twice. Second time the job will be removed from the
> mock scheduler job list and the drm_mock_sched_advance() call in the test
> will fail.
> 
> Fix it by making the "don't reset" flag persist for the lifetime of the
> job and add a new flag to verify that the code path had executed as
> expected.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Fixes: 1472e7549f84 ("drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG")
 > Cc: Maíra Canal <mcanal at igalia.com>> Cc: Philipp Stanner 
<phasta at kernel.org>
> ---
>   drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 2 +-
>   drivers/gpu/drm/scheduler/tests/sched_tests.h    | 7 ++++---
>   drivers/gpu/drm/scheduler/tests/tests_basic.c    | 4 ++--
>   3 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> index 65acffc3fea8..8e9ae7d980eb 100644
> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> @@ -219,7 +219,7 @@ mock_sched_timedout_job(struct drm_sched_job *sched_job)
>   	unsigned long flags;
>   
>   	if (job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET) {
> -		job->flags &= ~DRM_MOCK_SCHED_JOB_DONT_RESET;
> +		job->flags |= DRM_MOCK_SCHED_JOB_RESET_SKIPPED;
>   		return DRM_GPU_SCHED_STAT_NO_HANG;
>   	}
>   
> diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h b/drivers/gpu/drm/scheduler/tests/sched_tests.h
> index 63d4f2ac7074..5b262126b776 100644
> --- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
> +++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
> @@ -95,9 +95,10 @@ struct drm_mock_sched_job {
>   
>   	struct completion	done;
>   
> -#define DRM_MOCK_SCHED_JOB_DONE		0x1
> -#define DRM_MOCK_SCHED_JOB_TIMEDOUT	0x2
> -#define DRM_MOCK_SCHED_JOB_DONT_RESET	0x4
> +#define DRM_MOCK_SCHED_JOB_DONE			0x1
> +#define DRM_MOCK_SCHED_JOB_TIMEDOUT		0x2
> +#define DRM_MOCK_SCHED_JOB_DONT_RESET		0x4
> +#define DRM_MOCK_SCHED_JOB_RESET_SKIPPED	0x8
>   	unsigned long		flags;
>   
>   	struct list_head	link;
> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> index 55eb142bd7c5..82a41a456b0a 100644
> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> @@ -317,8 +317,8 @@ static void drm_sched_skip_reset(struct kunit *test)
>   	KUNIT_ASSERT_FALSE(test, done);
>   
>   	KUNIT_ASSERT_EQ(test,
> -			job->flags & DRM_MOCK_SCHED_JOB_DONT_RESET,
> -			0);
> +			job->flags & DRM_MOCK_SCHED_JOB_RESET_SKIPPED,
> +			DRM_MOCK_SCHED_JOB_RESET_SKIPPED);

Maybe we could assert that job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT == 0

Anyway, thanks for the fix!

Reviewed-by: Maíra Canal <mcanal at igalia.com>

Best Regards,
- Maíra

>   
>   	i = drm_mock_sched_advance(sched, 1);
>   	KUNIT_ASSERT_EQ(test, i, 1);



More information about the dri-devel mailing list