[PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

Christian König christian.koenig at amd.com
Tue Feb 11 10:21:28 UTC 2025


Am 11.02.25 um 11:08 schrieb Philipp Stanner:
> On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
>> Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
>>> Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
>>> newly
>>> added __drm_sched_entity_queue_pop.
>>>
>>> This allows breaking the hidden dependency that queue_node has to
>>> be the
>>> first element in struct drm_sched_job.
>>>
>>> A comment is also added with a reference to the mailing list
>>> discussion
>>> explaining the copied helper will be removed when the whole broken
>>> amdgpu_job_stop_all_jobs_on_sched is removed.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>> Cc: Christian König <christian.koenig at amd.com>
>>> Cc: Danilo Krummrich <dakr at kernel.org>
>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>> Cc: Philipp Stanner <phasta at kernel.org>
>>> Cc: "Zhang, Hawking" <Hawking.Zhang at amd.com>
>> Reviewed-by: Christian König <christian.koenig at amd.com>
> I think this v3 has been supplanted by a v4 by now.

I've seen the larger v4 series as well, but at least that patch here 
looks identical on first glance. So my rb still counts.

Christian.

>
> @Tvrtko: btw, do you create patches with
> git format-patch -v4 ?
>
> That way the v4 label will be included in all patch titles, too, not
> just the cover letter. That makes searching etc. easier in large
> inboxes
>
> P.
>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++-
>>> --
>>>    1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 100f04475943..22cb48bab24d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
>>> drm_sched_job *sched_job)
>>>    	return fence;
>>>    }
>>>    
>>> -#define to_drm_sched_job(sched_job)		\
>>> -		container_of((sched_job), struct drm_sched_job,
>>> queue_node)
>>> +/*
>>> + * This is a duplicate function from DRM scheduler
>>> sched_internal.h.
>>> + * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
>>> removed, due
>>> + * latter being incorrect and racy.
>>> + *
>>> + * See
>>> https://lore.kernel.org/amd-gfx/44edde63-7181-44fb-a4f7-94e50514f539@amd.com/
>>> + */
>>> +static struct drm_sched_job *
>>> +__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
>>> +{
>>> +	struct spsc_node *node;
>>> +
>>> +	node = spsc_queue_pop(&entity->job_queue);
>>> +	if (!node)
>>> +		return NULL;
>>> +
>>> +	return container_of(node, struct drm_sched_job,
>>> queue_node);
>>> +}
>>>    
>>>    void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
>>> *sched)
>>>    {
>>> @@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
>>> drm_gpu_scheduler *sched)
>>>    		struct drm_sched_rq *rq = sched->sched_rq[i];
>>>    		spin_lock(&rq->lock);
>>>    		list_for_each_entry(s_entity, &rq->entities, list)
>>> {
>>> -			while ((s_job =
>>> to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
>>> +			while ((s_job =
>>> __drm_sched_entity_queue_pop(s_entity))) {
>>>    				struct drm_sched_fence *s_fence =
>>> s_job->s_fence;
>>>    
>>>    				dma_fence_signal(&s_fence-
>>>> scheduled);



More information about the amd-gfx mailing list