[PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

Fri May 16 12:10:07 UTC 2025

On 16/05/2025 12:53, Tvrtko Ursulin wrote:
> 
> On 16/05/2025 08:28, Philipp Stanner wrote:
>> On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote:
>>>
>>> On 15/05/2025 16:00, Christian König wrote:
>>>> Sometimes drivers need to be able to submit multiple jobs which
>>>> depend on
>>>> each other to different schedulers at the same time, but using
>>>> drm_sched_job_add_dependency() can't fail any more after the first
>>>> job is
>>>> initialized.
>>>>
>>>> This function preallocate memory for dependency slots so that no
>>>> ENOMEM
>>>> can come later while adding dependencies.
>>>>
>>>> v2: rework implementation an documentation
>>>> v3: rework from scratch, use separate function to add preallocated
>>>> deps
>>
>> I think we agreed to not put change logs into commit messages anymore
>> :)
>>
>> They aren't useful for any reader. Who needs the changelog afterwards
>> can retreive it through the mail thread link that we add.
>>
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_main.c | 45
>>>> ++++++++++++++++++++++++++
>>>>    include/drm/gpu_scheduler.h            |  4 +++
>>>>    2 files changed, 49 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index f7118497e47a..b95e7089aa70 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -858,6 +858,51 @@ void drm_sched_job_arm(struct drm_sched_job
>>>> *job)
>>>>    }
>>>>    EXPORT_SYMBOL(drm_sched_job_arm);
>>>> +/**
>>>> + * drm_sched_job_prealloc_dependency_slot - avoid ENOMEM on adding
>>>> dependencies
>>>> + * @job: scheduler job where dependencies will be added
>>>> + * @id: id for the allocated slot
>>>> +  *
>>>> + * Sometimes drivers need to be able to submit multiple jobs which
>>>> depend on
>>>> + * each other to different schedulers at the same time, but using
>>>> + * drm_sched_job_add_dependency() can't fail any more after the
>>>> first job is
>>>> + * initialized.
>>>> + *
>>>> + * This function preallocate memory for a dependency slot so that
>>>> no ENOMEM can
>>>> + * come later while adding dependencies. The index of the
>>>> preallocated slot is
>>>> + * returned in @id.
>>>> + *
>>>> + * Return:
>>>> + * 0 on success, or an error on failing to expand the array.
>>>> + */
>>>> +int drm_sched_job_prealloc_dependency_slot(struct drm_sched_job
>>>> *job,
>>>> +                       u32 *id)
>>>> +{
>>>> +    return xa_alloc(&job->dependencies, id, NULL,
>>>> xa_limit_32b, GFP_KERNEL);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slot);
>>>> +
>>>> +/**
>>>> + * drm_sched_job_add_prealloc_dep - add dependency to preallocated
>>>> slot
>>>> + * @job: scheduler job where dependencies will be added
>>>> + * @id: the preallocated slot index
>>>> + * @fence: the dependency to add
>>>> + *
>>>> + * Consumes @fence and adds it to the preallocated slot
>>>> dependency.
>>>> + */
>>>> +void drm_sched_job_add_prealloc_dep(struct drm_sched_job *job, u32
>>>> id,
>>>> +                    struct dma_fence *fence)
>>>> +{
>>>> +    fence = xa_store(&job->dependencies, id, fence,
>>>> GFP_ATOMIC);
>>>
>>> Add assert that the passed id exists (was preallocated) and is NULL?
>>
>> You
> 
> Hm?
> 
>>>
>>> Also, if someone preallocates and does not consume the slot will that
>>> confuse the iteration in drm_sched_job_dependency()?
>>
>> drm_sched_job_add_dependency() you mean.
> 
> I was actually thinking of drm_sched_job_dependency() because that 
> looked it would skip dependencies upon encountering an unconsumed 
> preallocated slot, but yes, drm_sched_job_add_dependency() could explode 
> even earlier if adding a normal dependency after preallocating a slot.
> 
>> Yes, it would. All operations simply give you NULL for those slots. So
>> seems to me you have to check for NULL wherever a preallocated slot
>> might drop out. That would then be a bug.
>>
>> It's kind of tricky, all that. It's a pity that Wilcox didn't answer
>> our questions about the idiomatic way to do it.
>>
>> Maybe reserving slots with already signaled fences wasn't such a bad
>> idea after all?
>>
>> If we go for the NULL approach, it's probably the only sane way to then
>> check for NULL wherever dependencies are accessed :(
>>
>> Opinions?
> 
> Well if the xarray API returns the NULL consistently the approach from 
> this patch is fine I think.
> 
> We just need to add two more checks to the above mentioned functions, 

I need to correct myself, drm_sched_job_dependency() wouldn't be able to 
just skip NULLs since it relies on NULL for "no more dependencies". We 
would need to track something like job->max_dependency and terminate on 
job->last_dependency > job->max_dependency or so.

Regards,

Tvrtko

> some more unit tests probably to make sure, and that should be fine for 
> now.
> 
> On the bikeshedding front I would perhaps suggest:
> 
>   - drm_sched_job_preallocate_dependency()
>   - drm_sched_job_replace_dependency()
> 
> Reads a little bit more aligned with the rest of the API and a bit 
> easier on the eyes, to my eyes at least.
> 
> Regards,
> 
> Tvrtko
>