[PATCH 2/8] drm/sched: Always free the job after the timeout

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Tue May 6 11:49:20 UTC 2025


On 03/05/2025 21:59, Maíra Canal wrote:
> Currently, if we add the assertions presented in this commit to the mock
> scheduler, we will see the following output:
> 
> [15:47:08] ============== [PASSED] drm_sched_basic_tests ==============
> [15:47:08] ======== drm_sched_basic_timeout_tests (1 subtest) =========
> [15:47:08] # drm_sched_basic_timeout: ASSERTION FAILED at drivers/gpu/drm/scheduler/tests/tests_basic.c:246
> [15:47:08] Expected list_empty(&sched->job_list) to be true, but is false
> [15:47:08] [FAILED] drm_sched_basic_timeout
> [15:47:08] # module: drm_sched_tests
> 
> This occurs because `mock_sched_timedout_job()` doesn't properly handle
> the hang. From the DRM sched documentation, `drm_sched_stop()` and
> `drm_sched_start()` are typically used for reset recovery. If these
> functions are not used, the offending job won't be freed and should be
> freed by the caller.
> 
> Currently, the mock scheduler doesn't use the functions provided by the
> API, nor does it handle the freeing of the job. As a result, the job isn't
> removed from the job list.

For the record the job does gets freed via the kunit managed allocation.

It was a design choice for this test to be a *strict* unit test which 
tests only a _single_ thing. And that is that the timedout_job() hook 
gets called. As such the hook was implemented to satisfy that single 
requirement only.

But I also do not oppose making it test multiple things in one test per se.

> This commit mocks a GPU reset by stopping the scheduler affected by the
> reset, waiting a couple of microseconds to mimic a hardware reset, and
> then restart the affected scheduler.
> 
> Signed-off-by: Maíra Canal <mcanal at igalia.com>
> ---
>   drivers/gpu/drm/scheduler/tests/mock_scheduler.c | 10 ++++++++++
>   drivers/gpu/drm/scheduler/tests/tests_basic.c    |  3 +++
>   2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> index f999c8859cf7adb8f06fc8a37969656dd3249fa7..e9af202d84bd55ea5cc048215e39f5407bc84458 100644
> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> @@ -1,6 +1,8 @@
>   // SPDX-License-Identifier: GPL-2.0
>   /* Copyright (c) 2025 Valve Corporation */
>   
> +#include <linux/delay.h>
> +
>   #include "sched_tests.h"
>   
>   /*
> @@ -203,10 +205,18 @@ static struct dma_fence *mock_sched_run_job(struct drm_sched_job *sched_job)
>   static enum drm_gpu_sched_stat
>   mock_sched_timedout_job(struct drm_sched_job *sched_job)
>   {
> +	struct drm_mock_scheduler *sched =
> +		drm_sched_to_mock_sched(sched_job->sched);
>   	struct drm_mock_sched_job *job = drm_sched_job_to_mock_job(sched_job);
>   
>   	job->flags |= DRM_MOCK_SCHED_JOB_TIMEDOUT;
>   
> +	drm_sched_stop(&sched->base, &job->base);
> +
> +	usleep_range(200, 500);

msleep(10) or something to make it seem less like the actual numbers are 
relevant?

 > +> +	drm_sched_start(&sched->base, 0);
> +
>   	return DRM_GPU_SCHED_STAT_NOMINAL;
>   }
>   
> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> index 7230057e0594c6246f02608f07fcb1f8d738ac75..8f960f0fd31d0af7873f410ceba2d636f58a5474 100644
> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> @@ -241,6 +241,9 @@ static void drm_sched_basic_timeout(struct kunit *test)
>   			job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
>   			DRM_MOCK_SCHED_JOB_TIMEDOUT);
>   
> +	KUNIT_ASSERT_TRUE(test, list_empty(&sched->job_list));

Hmm I think this assert could be racy because it appears to rely on the 
free worker to run and cleanup the "finished" job in the window between 
drm_mock_sched_job_wait_finished() (or drm_sched_start(), depends how 
you look at it) and here. Am I missing something?

Regards,

Tvrtko

> +	KUNIT_ASSERT_TRUE(test, list_empty(&sched->done_list));
 > +>   	drm_mock_sched_entity_free(entity);
>   }
>   
> 



More information about the etnaviv mailing list