[PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Fri Feb 14 10:21:51 UTC 2025
Hi Christian,
On 11/02/2025 10:21, Christian König wrote:
> Am 11.02.25 um 11:08 schrieb Philipp Stanner:
>> On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
>>> Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
>>>> Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
>>>> newly
>>>> added __drm_sched_entity_queue_pop.
>>>>
>>>> This allows breaking the hidden dependency that queue_node has to
>>>> be the
>>>> first element in struct drm_sched_job.
>>>>
>>>> A comment is also added with a reference to the mailing list
>>>> discussion
>>>> explaining the copied helper will be removed when the whole broken
>>>> amdgpu_job_stop_all_jobs_on_sched is removed.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>> Cc: Christian König <christian.koenig at amd.com>
>>>> Cc: Danilo Krummrich <dakr at kernel.org>
>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>> Cc: Philipp Stanner <phasta at kernel.org>
>>>> Cc: "Zhang, Hawking" <Hawking.Zhang at amd.com>
>>> Reviewed-by: Christian König <christian.koenig at amd.com>
>> I think this v3 has been supplanted by a v4 by now.
>
> I've seen the larger v4 series as well, but at least that patch here
> looks identical on first glance. So my rb still counts.
Is it okay for you to merge the whole series (including this single
amdgpu patch) via drm-misc?
Regards,
Tvrtko
>> @Tvrtko: btw, do you create patches with
>> git format-patch -v4 ?
>>
>> That way the v4 label will be included in all patch titles, too, not
>> just the cover letter. That makes searching etc. easier in large
>> inboxes
>>
>> P.
>>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++-
>>>> --
>>>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index 100f04475943..22cb48bab24d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
>>>> drm_sched_job *sched_job)
>>>> return fence;
>>>> }
>>>> -#define to_drm_sched_job(sched_job) \
>>>> - container_of((sched_job), struct drm_sched_job,
>>>> queue_node)
>>>> +/*
>>>> + * This is a duplicate function from DRM scheduler
>>>> sched_internal.h.
>>>> + * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
>>>> removed, due
>>>> + * latter being incorrect and racy.
>>>> + *
>>>> + * See
>>>> https://lore.kernel.org/amd-gfx/44edde63-7181-44fb-
>>>> a4f7-94e50514f539 at amd.com/
>>>> + */
>>>> +static struct drm_sched_job *
>>>> +__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
>>>> +{
>>>> + struct spsc_node *node;
>>>> +
>>>> + node = spsc_queue_pop(&entity->job_queue);
>>>> + if (!node)
>>>> + return NULL;
>>>> +
>>>> + return container_of(node, struct drm_sched_job,
>>>> queue_node);
>>>> +}
>>>> void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
>>>> *sched)
>>>> {
>>>> @@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
>>>> drm_gpu_scheduler *sched)
>>>> struct drm_sched_rq *rq = sched->sched_rq[i];
>>>> spin_lock(&rq->lock);
>>>> list_for_each_entry(s_entity, &rq->entities, list)
>>>> {
>>>> - while ((s_job =
>>>> to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
>>>> + while ((s_job =
>>>> __drm_sched_entity_queue_pop(s_entity))) {
>>>> struct drm_sched_fence *s_fence =
>>>> s_job->s_fence;
>>>> dma_fence_signal(&s_fence-
>>>>> scheduled);
>
More information about the amd-gfx
mailing list