[PATCH 1/2] drm/sched: add drm_sched_prealloc_dependency_slots v2

Wed Apr 9 13:55:02 UTC 2025

Am 09.04.25 um 12:28 schrieb Philipp Stanner:
> On Fri, 2025-03-21 at 16:58 +0100, Christian König wrote:
>> Sometimes drivers need to be able to submit multiple jobs which
>> depend on
>> each other to different schedulers at the same time, but using
>> drm_sched_job_add_dependency() can't fail any more after the first
>> job is
>> initialized.
>>
>> This function preallocate memory for dependency slots so that no
>> ENOMEM
>> can come later while adding dependencies.
>>
>> v2: rework implementation an documentation
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> ---
>>  drivers/gpu/drm/scheduler/sched_main.c | 44
>> ++++++++++++++++++++++++--
>>  include/drm/gpu_scheduler.h            |  2 ++
>>  2 files changed, 43 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 4d4219fbe49d..ee3701f346b2 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -852,6 +852,39 @@ void drm_sched_job_arm(struct drm_sched_job
>> *job)
>>  }
>>  EXPORT_SYMBOL(drm_sched_job_arm);
>>  
>> +/**
>> + * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on adding
>> dependencies
>> + * @job: scheduler job where dependencies will be added
>> + * @num_deps: number of dependencies to preallocate slots for
>> +  *
>> + * Sometimes drivers need to be able to submit multiple jobs which
>> depend on
>> + * each other to different schedulers at the same time, but using
>> + * drm_sched_job_add_dependency() can't fail any more after the
>> first job is
>> + * initialized.
>> + *
>> + * This function preallocate memory for dependency slots so that no
>> ENOMEM can
>> + * come later while adding dependencies.
>> + *
>> + * Return:
>> + * 0 on success, or an error on failing to expand the array.
>> + */
>> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
>> *job,
>> +					    unsigned int num_deps)
>> +{
>> +	u32 id = 0;
>> +	int ret;
>> +
>> +	while (num_deps--) {
>> +		ret = xa_alloc(&job->dependencies, &id,
>> XA_ZERO_ENTRY,
>> +			       xa_limit_32b, GFP_KERNEL);
> I've had some time to re-read the xarray documentation and I think that
> this is what xa_reserve() was born for. The Book of Documentation/core-
> api/xarray.rst sayeth:
>
> "Sometimes you need to ensure that a subsequent call to xa_store()
> will not need to allocate memory.  The xa_reserve() function
> will store a reserved entry at the indicated index.  Users of the
> normal API will see this entry as containing ``NULL``."
>
> That's far better, this way we don't have to use that more or less
> xarray-internal flag.

Yeah I have seen that as well. The reason why I didn't followed this route was that I wasn't sure if I then need to check for NULL entries while iterating over the XA.

Additional to that I couldn't figure out of hand how to determine a the next free index slot.

Have you found any example how to use that? I mean the documentation could certainly be improved a bit.

Regards,
Christian.

>
>
>> +		if (ret != 0)
>> +			return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots);
>> +
>>  /**
>>   * drm_sched_job_add_dependency - adds the fence as a job dependency
>>   * @job: scheduler job to add the dependencies to
>> @@ -878,10 +911,15 @@ int drm_sched_job_add_dependency(struct
>> drm_sched_job *job,
>>  	 * engines involved, rather than the number of BOs.
>>  	 */
>>  	xa_for_each(&job->dependencies, index, entry) {
>> -		if (entry->context != fence->context)
>> +		if (xa_is_zero(entry)) {
>> +			/*
>> +			 * Reserved entries must not alloc memory,
>> but let's
>> +			 * use GFP_ATOMIC just to be on the
>> defensive side.
>> +			*/
>> +			xa_store(&job->dependencies, index, fence,
>> GFP_ATOMIC);
> And regarding this – it can actually never happen, but you provide
> ATOMIC just to be sure?
>
> I think it would be better if we'd just run into an obvious bug here
> instead, so like a deadlock with GFP_KERNEL.
>
> That's how we do it with pointers that cannot be NULL, too. If the
> impossible were to happen and it were NULL, we'd crash.
>
> P.
>
>> +		} else if (entry->context != fence->context) {
>>  			continue;
>> -
>> -		if (dma_fence_is_later(fence, entry)) {
>> +		} else if (dma_fence_is_later(fence, entry)) {
>>  			dma_fence_put(entry);
>>  			xa_store(&job->dependencies, index, fence,
>> GFP_KERNEL);
>>  		} else {
>> diff --git a/include/drm/gpu_scheduler.h
>> b/include/drm/gpu_scheduler.h
>> index 1a7e377d4cbb..916e820b27ff 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>  		       u32 credits, void *owner);
>>  void drm_sched_job_arm(struct drm_sched_job *job);
>>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
>> +int drm_sched_job_prealloc_dependency_slots(struct drm_sched_job
>> *job,
>> +					    unsigned int num_deps);
>>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>>  				 struct dma_fence *fence);
>>  int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,