[PATCH 2/8] drm/sched: Always free the job after the timeout

Maíra Canal mcanal at igalia.com
Tue May 6 13:38:35 UTC 2025


Hi Tvrtko,

On 06/05/25 10:28, Tvrtko Ursulin wrote:
> 
> On 06/05/2025 13:46, Maíra Canal wrote:
>> Hi Tvrtko,
>>
>> Thanks for your review!
>>
>> On 06/05/25 08:49, Tvrtko Ursulin wrote:
>>>
>>> On 03/05/2025 21:59, Maíra Canal wrote:
>>>> Currently, if we add the assertions presented in this commit to the 
>>>> mock
>>>> scheduler, we will see the following output:
>>>>
>>>> [15:47:08] ============== [PASSED] drm_sched_basic_tests ==============
>>>> [15:47:08] ======== drm_sched_basic_timeout_tests (1 subtest) =========
>>>> [15:47:08] # drm_sched_basic_timeout: ASSERTION FAILED at drivers/ 
>>>> gpu/ drm/scheduler/tests/tests_basic.c:246
>>>> [15:47:08] Expected list_empty(&sched->job_list) to be true, but is 
>>>> false
>>>> [15:47:08] [FAILED] drm_sched_basic_timeout
>>>> [15:47:08] # module: drm_sched_tests
>>>>
>>>> This occurs because `mock_sched_timedout_job()` doesn't properly handle
>>>> the hang. From the DRM sched documentation, `drm_sched_stop()` and
>>>> `drm_sched_start()` are typically used for reset recovery. If these
>>>> functions are not used, the offending job won't be freed and should be
>>>> freed by the caller.
>>>>
>>>> Currently, the mock scheduler doesn't use the functions provided by the
>>>> API, nor does it handle the freeing of the job. As a result, the job 
>>>> isn't
>>>> removed from the job list.
>>>
>>> For the record the job does gets freed via the kunit managed allocation.
>>
>> Sorry, I didn't express myself correctly. Indeed, it is. I meant that
>> the DRM scheduler didn't free the job.
>>
>>>
>>> It was a design choice for this test to be a *strict* unit test which 
>>> tests only a _single_ thing. And that is that the timedout_job() hook 
>>> gets called. As such the hook was implemented to satisfy that single 
>>> requirement only.
>>>
>>
>> What do you think about checking that `sched->job_list` won't be empty?
>>
>> I wanted to add such assertion to make sure that the behavior of the
>> timeout won't change in future (e.g. a patch makes a change that calls
>> `free_job()` for the guilty job at timeout). Does it make sense to you?
> 
> Where would that assert be?
> 

I believe it would be in the same place as this patch assertions, but
instead of `KUNIT_ASSERT_TRUE(test, list_empty(&sched->job_list));`, it
would be `KUNIT_ASSERT_FALSE(test, list_empty(&sched->job_list));`.

But I don't feel strongly about it. I can drop the patch if you believe
it's a better option.

Best Regards,
- Maíra


More information about the Intel-xe mailing list