[PATCH 2/8] drm/sched: Always free the job after the timeout
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Tue May 6 14:18:17 UTC 2025
On 06/05/2025 14:38, Maíra Canal wrote:
> Hi Tvrtko,
>
> On 06/05/25 10:28, Tvrtko Ursulin wrote:
>>
>> On 06/05/2025 13:46, Maíra Canal wrote:
>>> Hi Tvrtko,
>>>
>>> Thanks for your review!
>>>
>>> On 06/05/25 08:49, Tvrtko Ursulin wrote:
>>>>
>>>> On 03/05/2025 21:59, Maíra Canal wrote:
>>>>> Currently, if we add the assertions presented in this commit to the
>>>>> mock
>>>>> scheduler, we will see the following output:
>>>>>
>>>>> [15:47:08] ============== [PASSED] drm_sched_basic_tests
>>>>> ==============
>>>>> [15:47:08] ======== drm_sched_basic_timeout_tests (1 subtest)
>>>>> =========
>>>>> [15:47:08] # drm_sched_basic_timeout: ASSERTION FAILED at drivers/
>>>>> gpu/ drm/scheduler/tests/tests_basic.c:246
>>>>> [15:47:08] Expected list_empty(&sched->job_list) to be true, but is
>>>>> false
>>>>> [15:47:08] [FAILED] drm_sched_basic_timeout
>>>>> [15:47:08] # module: drm_sched_tests
>>>>>
>>>>> This occurs because `mock_sched_timedout_job()` doesn't properly
>>>>> handle
>>>>> the hang. From the DRM sched documentation, `drm_sched_stop()` and
>>>>> `drm_sched_start()` are typically used for reset recovery. If these
>>>>> functions are not used, the offending job won't be freed and should be
>>>>> freed by the caller.
>>>>>
>>>>> Currently, the mock scheduler doesn't use the functions provided by
>>>>> the
>>>>> API, nor does it handle the freeing of the job. As a result, the
>>>>> job isn't
>>>>> removed from the job list.
>>>>
>>>> For the record the job does gets freed via the kunit managed
>>>> allocation.
>>>
>>> Sorry, I didn't express myself correctly. Indeed, it is. I meant that
>>> the DRM scheduler didn't free the job.
>>>
>>>>
>>>> It was a design choice for this test to be a *strict* unit test
>>>> which tests only a _single_ thing. And that is that the
>>>> timedout_job() hook gets called. As such the hook was implemented to
>>>> satisfy that single requirement only.
>>>>
>>>
>>> What do you think about checking that `sched->job_list` won't be empty?
>>>
>>> I wanted to add such assertion to make sure that the behavior of the
>>> timeout won't change in future (e.g. a patch makes a change that calls
>>> `free_job()` for the guilty job at timeout). Does it make sense to you?
>>
>> Where would that assert be?
>>
>
> I believe it would be in the same place as this patch assertions, but
> instead of `KUNIT_ASSERT_TRUE(test, list_empty(&sched->job_list));`, it
> would be `KUNIT_ASSERT_FALSE(test, list_empty(&sched->job_list));`.
>
> But I don't feel strongly about it. I can drop the patch if you believe
> it's a better option.
I don't mind this patch, I was just explaining how the current test was
deliberately testing a single thing. But we can change it to test more,
that's fine.
In this case it would go from testing that the timeout callback fires to
testing the callback fires and something else happens.
Key is to define the "something else" part so it is not timing sensitive.
The drm_sched_stop+delay+drm_sched_start approach would perhaps work if
you would signal the job with an errno set before drm_sched_stop?
Then it would hit the "free_guilty" path in drm_sched_stop and wouldn't
be timing sensitive. As long as there aren't any locking considerations
I am missing. That could then safely have the two list_empty asserts.
Regards,
Tvrtko
More information about the etnaviv
mailing list