[PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Fri Feb 14 10:21:51 UTC 2025


Hi Christian,

On 11/02/2025 10:21, Christian König wrote:
> Am 11.02.25 um 11:08 schrieb Philipp Stanner:
>> On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
>>> Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
>>>> Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
>>>> newly
>>>> added __drm_sched_entity_queue_pop.
>>>>
>>>> This allows breaking the hidden dependency that queue_node has to
>>>> be the
>>>> first element in struct drm_sched_job.
>>>>
>>>> A comment is also added with a reference to the mailing list
>>>> discussion
>>>> explaining the copied helper will be removed when the whole broken
>>>> amdgpu_job_stop_all_jobs_on_sched is removed.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>> Cc: Christian König <christian.koenig at amd.com>
>>>> Cc: Danilo Krummrich <dakr at kernel.org>
>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>> Cc: Philipp Stanner <phasta at kernel.org>
>>>> Cc: "Zhang, Hawking" <Hawking.Zhang at amd.com>
>>> Reviewed-by: Christian König <christian.koenig at amd.com>
>> I think this v3 has been supplanted by a v4 by now.
> 
> I've seen the larger v4 series as well, but at least that patch here 
> looks identical on first glance. So my rb still counts.

Is it okay for you to merge the whole series (including this single 
amdgpu patch) via drm-misc?

Regards,

Tvrtko

>> @Tvrtko: btw, do you create patches with
>> git format-patch -v4 ?
>>
>> That way the v4 label will be included in all patch titles, too, not
>> just the cover letter. That makes searching etc. easier in large
>> inboxes
>>
>> P.
>>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++-
>>>> -- 
>>>>    1 file changed, 19 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index 100f04475943..22cb48bab24d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
>>>> drm_sched_job *sched_job)
>>>>        return fence;
>>>>    }
>>>> -#define to_drm_sched_job(sched_job)        \
>>>> -        container_of((sched_job), struct drm_sched_job,
>>>> queue_node)
>>>> +/*
>>>> + * This is a duplicate function from DRM scheduler
>>>> sched_internal.h.
>>>> + * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
>>>> removed, due
>>>> + * latter being incorrect and racy.
>>>> + *
>>>> + * See
>>>> https://lore.kernel.org/amd-gfx/44edde63-7181-44fb- 
>>>> a4f7-94e50514f539 at amd.com/
>>>> + */
>>>> +static struct drm_sched_job *
>>>> +__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
>>>> +{
>>>> +    struct spsc_node *node;
>>>> +
>>>> +    node = spsc_queue_pop(&entity->job_queue);
>>>> +    if (!node)
>>>> +        return NULL;
>>>> +
>>>> +    return container_of(node, struct drm_sched_job,
>>>> queue_node);
>>>> +}
>>>>    void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
>>>> *sched)
>>>>    {
>>>> @@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
>>>> drm_gpu_scheduler *sched)
>>>>            struct drm_sched_rq *rq = sched->sched_rq[i];
>>>>            spin_lock(&rq->lock);
>>>>            list_for_each_entry(s_entity, &rq->entities, list)
>>>> {
>>>> -            while ((s_job =
>>>> to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
>>>> +            while ((s_job =
>>>> __drm_sched_entity_queue_pop(s_entity))) {
>>>>                    struct drm_sched_fence *s_fence =
>>>> s_job->s_fence;
>>>>                    dma_fence_signal(&s_fence-
>>>>> scheduled);
> 



More information about the amd-gfx mailing list