[PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

Fri May 16 11:53:20 UTC 2025

On 16/05/2025 08:28, Philipp Stanner wrote:
> On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote:
>>
>> On 15/05/2025 16:00, Christian König wrote:
>>> Sometimes drivers need to be able to submit multiple jobs which
>>> depend on
>>> each other to different schedulers at the same time, but using
>>> drm_sched_job_add_dependency() can't fail any more after the first
>>> job is
>>> initialized.
>>>
>>> This function preallocate memory for dependency slots so that no
>>> ENOMEM
>>> can come later while adding dependencies.
>>>
>>> v2: rework implementation an documentation
>>> v3: rework from scratch, use separate function to add preallocated
>>> deps
> 
> I think we agreed to not put change logs into commit messages anymore
> :)
> 
> They aren't useful for any reader. Who needs the changelog afterwards
> can retreive it through the mail thread link that we add.
> 
>>>
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 45
>>> ++++++++++++++++++++++++++
>>>    include/drm/gpu_scheduler.h            |  4 +++
>>>    2 files changed, 49 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index f7118497e47a..b95e7089aa70 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -858,6 +858,51 @@ void drm_sched_job_arm(struct drm_sched_job
>>> *job)
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_job_arm);
>>>    
>>> +/**
>>> + * drm_sched_job_prealloc_dependency_slot - avoid ENOMEM on adding
>>> dependencies
>>> + * @job: scheduler job where dependencies will be added
>>> + * @id: id for the allocated slot
>>> +  *
>>> + * Sometimes drivers need to be able to submit multiple jobs which
>>> depend on
>>> + * each other to different schedulers at the same time, but using
>>> + * drm_sched_job_add_dependency() can't fail any more after the
>>> first job is
>>> + * initialized.
>>> + *
>>> + * This function preallocate memory for a dependency slot so that
>>> no ENOMEM can
>>> + * come later while adding dependencies. The index of the
>>> preallocated slot is
>>> + * returned in @id.
>>> + *
>>> + * Return:
>>> + * 0 on success, or an error on failing to expand the array.
>>> + */
>>> +int drm_sched_job_prealloc_dependency_slot(struct drm_sched_job
>>> *job,
>>> +					   u32 *id)
>>> +{
>>> +	return xa_alloc(&job->dependencies, id, NULL,
>>> xa_limit_32b, GFP_KERNEL);
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slot);
>>> +
>>> +/**
>>> + * drm_sched_job_add_prealloc_dep - add dependency to preallocated
>>> slot
>>> + * @job: scheduler job where dependencies will be added
>>> + * @id: the preallocated slot index
>>> + * @fence: the dependency to add
>>> + *
>>> + * Consumes @fence and adds it to the preallocated slot
>>> dependency.
>>> + */
>>> +void drm_sched_job_add_prealloc_dep(struct drm_sched_job *job, u32
>>> id,
>>> +				    struct dma_fence *fence)
>>> +{
>>> +	fence = xa_store(&job->dependencies, id, fence,
>>> GFP_ATOMIC);
>>
>> Add assert that the passed id exists (was preallocated) and is NULL?
> 
> You

Hm?

>>
>> Also, if someone preallocates and does not consume the slot will that
>> confuse the iteration in drm_sched_job_dependency()?
> 
> drm_sched_job_add_dependency() you mean.

I was actually thinking of drm_sched_job_dependency() because that 
looked it would skip dependencies upon encountering an unconsumed 
preallocated slot, but yes, drm_sched_job_add_dependency() could explode 
even earlier if adding a normal dependency after preallocating a slot.

> Yes, it would. All operations simply give you NULL for those slots. So
> seems to me you have to check for NULL wherever a preallocated slot
> might drop out. That would then be a bug.
> 
> It's kind of tricky, all that. It's a pity that Wilcox didn't answer
> our questions about the idiomatic way to do it.
> 
> Maybe reserving slots with already signaled fences wasn't such a bad
> idea after all?
> 
> If we go for the NULL approach, it's probably the only sane way to then
> check for NULL wherever dependencies are accessed :(
> 
> Opinions?

Well if the xarray API returns the NULL consistently the approach from 
this patch is fine I think.

We just need to add two more checks to the above mentioned functions, 
some more unit tests probably to make sure, and that should be fine for now.

On the bikeshedding front I would perhaps suggest:

  - drm_sched_job_preallocate_dependency()
  - drm_sched_job_replace_dependency()

Reads a little bit more aligned with the rest of the API and a bit 
easier on the eyes, to my eyes at least.

Regards,

Tvrtko