[PATCH] drm/amdgpu: Update ring scheduler info as needed

Chen, Guchun Guchun.Chen at amd.com
Wed Jul 19 04:56:12 UTC 2023


[Public]

> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> Sent: Wednesday, July 19, 2023 12:53 PM
> To: Chen, Guchun <Guchun.Chen at amd.com>; Zhu, James
> <James.Zhu at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Kamal, Asad
> <Asad.Kamal at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Update ring scheduler info as needed
>
>
>
> On 7/19/2023 10:12 AM, Chen, Guchun wrote:
> > [Public]
> >
> >> -----Original Message-----
> >> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> >> Lazar, Lijo
> >> Sent: Wednesday, July 19, 2023 12:13 AM
> >> To: Zhu, James <James.Zhu at amd.com>; amd-gfx at lists.freedesktop.org
> >> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Zhu, James
> >> <James.Zhu at amd.com>; Kamal, Asad <Asad.Kamal at amd.com>; Zhang,
> Hawking
> >> <Hawking.Zhang at amd.com>
> >> Subject: Re: [PATCH] drm/amdgpu: Update ring scheduler info as needed
> >>
> >>
> >>
> >> On 7/18/2023 9:39 PM, James Zhu wrote:
> >>>
> >>> On 2023-07-18 11:54, Lazar, Lijo wrote:
> >>>>
> >>>>
> >>>> On 7/18/2023 8:39 PM, James Zhu wrote:
> >>>>>
> >>>>> On 2023-07-18 10:19, Lazar, Lijo wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 7/18/2023 7:30 PM, James Zhu wrote:
> >>>>>>>
> >>>>>>> On 2023-07-18 08:21, Lijo Lazar wrote:
> >>>>>>>> Not all rings have scheduler associated. Only update scheduler
> >>>>>>>> data for rings with scheduler. It could result in out of bound
> >>>>>>>> access as total rings are more than those associated with
> >>>>>>>> particular IPs.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
> >>>>>>>> ---
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 2 +-
> >>>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>>>> index 72b629a78c62..d0fc62784e82 100644
> >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>>>> @@ -134,7 +134,7 @@ static int
> >>>>>>>> aqua_vanjaram_xcp_sched_list_update(
> >>>>>>>>        for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> >>>>>>>>            ring = adev->rings[i];
> >>>>>>>> -        if (!ring || !ring->sched.ready)
> >>>>>>>> +        if (!ring || !ring->sched.ready || ring->no_scheduler)
> >>>>>>> [JZ] any case for ring->no_scheduler = true, but
> >>>>>>> ring->sched.ready = true?
> >>>>>>
> >>>>>> amdgpu_gfx_kiq_init_ring sets no_scheduler flag true for kiq rings.
> >>>>>> amdgpu_device_init_schedulers inits schedulers only for rings
> >>>>>> which doesn't have no_scheduler. However in
> >>>>>> gfx_v9_4_3_xcc_kiq_resume, the ring is marked with
> >>>>>> ring->sched.ready = true;
> >>>>>>
> >>>>>> I'm not sure why it's done this way, but this is the case even
> >>>>>> for gfxv9.
> >
> > Driver so far still has some concept abuse on sched.ready. In
> amdgpu_device_init_schedulers , it's set to be true once setting up sw
> scheduler for the ring requirement, however, after driver is up, this flag
> works like a hint to tell the corresponding ring is ready for HW submission
> after ring test succeeds.
> >
> > For this case, it's probably caused by amdgpu_gfx_enable_kcq  calling
> amdgpu_ring_test_helper, which sets sched.ready unconditionally based on
> ring test result. This will lead value mismatch between ring->no_scheduler
> and ring->sched.ready. If my understanding is correct, I think adding a check
> in this helper function which only sets sched.ready when no_scheduler is
> false will help.  So I will provide a patch soon to prevent this mismatch in
> future.
> >
> > +if (!ring->no_scheduler)
> > +       ring->sched.ready != r;
>
> The ring.ready(ring->sched.ready) flag is used in gmcv9 code as well.This will
> need more rework.

Exactly, let me double check.

Regards,
Guchun

> Thanks,
> Lijo
> >
> > Regards,
> > Guchun
> >
> >>>>> [JZ] I think the sched.ready confuses people. here just means ring
> >>>>> is ready be used.
> >>>>
> >>>> Thanks for the clarification. One question is then do we need to
> >>>> check ring ready status for assigning xcp ids or just check if the
> >>>> ring is associated with a scheduler?
> >>>>
> >>>> Keep only this - if (!ring || ring->no_scheduler) to assign xcp ids
> >>>> and leave the ring ready status to the logic which really accepts
> >>>> jobs on the ring?
> >>> [JZ] I think keeping ring->sched.ready will reduce schedule list
> >>> which will save a little scheduling time.
> >>
> >> Fine, will just push this one.
> >>
> >> Thanks,
> >> Lijo
> >>
> >>>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>>>> This patch is Reviewed-by: James Zhu <James.Zhu at amd.com>
> >>>>>
> >>>>>> Thanks,
> >>>>>> Lijo
> >>>>>>
> >>>>>>>>                continue;
> >>>>>>>>            aqua_vanjaram_xcp_gpu_sched_update(adev, ring,
> >>>>>>>> ring->xcp_id);


More information about the amd-gfx mailing list