[PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang

Nirmoy nirmodas at amd.com
Wed May 19 11:27:25 UTC 2021


On 5/19/21 5:14 AM, Huang, Ray wrote:
>
> [Public]
>
> I check the patch (below) to disable compute queues for raven is not 
> landed into drm-next. So actually all queues are enabled at this 
> moment. Nirmoy, can we get your confirmation?
>

I indeed didn't push the commit that disable all but one cu for raven. I 
was suppose to check with kfd as Felix wanted to

know if that bug affects KFD. I think I got distracted with something else.


Regards,

Nirmoy

> *diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c*
>
> *index 97a8f786cf85..9352fcb77fe9 100644*
>
> *--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c*
>
> *+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c*
>
> *@@ -812,6 +812,13 @@* void amdgpu_kiq_wreg(struct amdgpu_device 
> *adev, uint32_t reg, uint32_t v)
>
> int amdgpu_gfx_get_num_kcq(struct amdgpu_device *adev)
>
> {
>
> if (amdgpu_num_kcq == -1) {
>
> + /* raven firmware currently can not load balance jobs
>
> + * among multiple compute queues. Enable only one
>
> + * compute queue till we have a firmware fix.
>
> + */
>
> + if (adev->asic_type == CHIP_RAVEN)
>
> + return 1;
>
> +
>
> return 8;
>
> } else if (amdgpu_num_kcq > 8 || amdgpu_num_kcq < 0) {
>
> dev_warn(adev->dev, "set kernel compute queue number to 8 due to 
> invalid parameter provided by user\n");
>
> And I am glad to see that we have a solution to fix this issue at 
> current. Nice work, Changfeng!
>
> Best Regards,
>
> Ray
>
> *From:* Deucher, Alexander <Alexander.Deucher at amd.com>
> *Sent:* Wednesday, May 19, 2021 11:04 AM
> *To:* Chen, Guchun <Guchun.Chen at amd.com>; Zhu, Changfeng 
> <Changfeng.Zhu at amd.com>; Alex Deucher <alexdeucher at gmail.com>; Das, 
> Nirmoy <Nirmoy.Das at amd.com>
> *Cc:* Huang, Ray <Ray.Huang at amd.com>; amd-gfx list 
> <amd-gfx at lists.freedesktop.org>
> *Subject:* Re: [PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to 
> avoid compute hang
>
> [Public]
>
> I thought we had disabled all but one of the compute queues on raven 
> due to this issue or at least disabled the schedulers for the 
> additional queues, but maybe I'm misremembering.
>
> Alex
>
> ------------------------------------------------------------------------
>
> *From:*Chen, Guchun <Guchun.Chen at amd.com <mailto:Guchun.Chen at amd.com>>
> *Sent:* Tuesday, May 18, 2021 11:00 PM
> *To:* Zhu, Changfeng <Changfeng.Zhu at amd.com 
> <mailto:Changfeng.Zhu at amd.com>>; Deucher, Alexander 
> <Alexander.Deucher at amd.com <mailto:Alexander.Deucher at amd.com>>; Alex 
> Deucher <alexdeucher at gmail.com <mailto:alexdeucher at gmail.com>>; Das, 
> Nirmoy <Nirmoy.Das at amd.com <mailto:Nirmoy.Das at amd.com>>
> *Cc:* Huang, Ray <Ray.Huang at amd.com <mailto:Ray.Huang at amd.com>>; 
> amd-gfx list <amd-gfx at lists.freedesktop.org 
> <mailto:amd-gfx at lists.freedesktop.org>>
> *Subject:* RE: [PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to 
> avoid compute hang
>
> [Public]
>
> Nirmoy’s patch landed already if I understand correctly.
>
> d41a39dda140 drm/scheduler: improve job distribution with multiple queues
>
> Regards,
>
> Guchun
>
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org 
> <mailto:amd-gfx-bounces at lists.freedesktop.org>> *On Behalf Of *Zhu, 
> Changfeng
> *Sent:* Wednesday, May 19, 2021 10:56 AM
> *To:* Deucher, Alexander <Alexander.Deucher at amd.com 
> <mailto:Alexander.Deucher at amd.com>>; Alex Deucher 
> <alexdeucher at gmail.com <mailto:alexdeucher at gmail.com>>; Das, Nirmoy 
> <Nirmoy.Das at amd.com <mailto:Nirmoy.Das at amd.com>>
> *Cc:* Huang, Ray <Ray.Huang at amd.com <mailto:Ray.Huang at amd.com>>; 
> amd-gfx list <amd-gfx at lists.freedesktop.org 
> <mailto:amd-gfx at lists.freedesktop.org>>
> *Subject:* RE: [PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to 
> avoid compute hang
>
> [Public]
>
> [Public]
>
> Hi Alex,
>
> This is the issue exposed by Nirmoy's patch that provided better load 
> balancing across queues.
>
> BR,
>
> Changfeng.
>
> *From:* Deucher, Alexander <Alexander.Deucher at amd.com 
> <mailto:Alexander.Deucher at amd.com>>
> *Sent:* Wednesday, May 19, 2021 10:53 AM
> *To:* Zhu, Changfeng <Changfeng.Zhu at amd.com 
> <mailto:Changfeng.Zhu at amd.com>>; Alex Deucher <alexdeucher at gmail.com 
> <mailto:alexdeucher at gmail.com>>; Das, Nirmoy <Nirmoy.Das at amd.com 
> <mailto:Nirmoy.Das at amd.com>>
> *Cc:* Huang, Ray <Ray.Huang at amd.com <mailto:Ray.Huang at amd.com>>; 
> amd-gfx list <amd-gfx at lists.freedesktop.org 
> <mailto:amd-gfx at lists.freedesktop.org>>
> *Subject:* Re: [PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to 
> avoid compute hang
>
> [Public]
>
> + Nirmoy
>
> I thought we disabled all but one of the compute queues on raven due 
> to this issue. Maybe that patch never landed?  Wasn't this the same 
> issue that was exposed by Nirmoy's patch that provided better load 
> balancing across queues?
>
> Alex
>
> ------------------------------------------------------------------------
>
> *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org 
> <mailto:amd-gfx-bounces at lists.freedesktop.org>> on behalf of Zhu, 
> Changfeng <Changfeng.Zhu at amd.com <mailto:Changfeng.Zhu at amd.com>>
> *Sent:* Tuesday, May 18, 2021 10:28 PM
> *To:* Alex Deucher <alexdeucher at gmail.com <mailto:alexdeucher at gmail.com>>
> *Cc:* Huang, Ray <Ray.Huang at amd.com <mailto:Ray.Huang at amd.com>>; 
> amd-gfx list <amd-gfx at lists.freedesktop.org 
> <mailto:amd-gfx at lists.freedesktop.org>>
> *Subject:* RE: [PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to 
> avoid compute hang
>
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Alex.
>
> I have submitted the patch: drm/amdgpu: disable 3DCGCG on 
> picasso/raven1 to avoid compute hang
>
> Do you mean we have something else to do for re-enabling the extra 
> compute queues?
>
> BR,
> Changfeng.
>
> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com <mailto:alexdeucher at gmail.com>>
> Sent: Wednesday, May 19, 2021 10:20 AM
> To: Zhu, Changfeng <Changfeng.Zhu at amd.com <mailto:Changfeng.Zhu at amd.com>>
> Cc: Huang, Ray <Ray.Huang at amd.com <mailto:Ray.Huang at amd.com>>; amd-gfx 
> list <amd-gfx at lists.freedesktop.org 
> <mailto:amd-gfx at lists.freedesktop.org>>
> Subject: Re: [PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to 
> avoid compute hang
>
> Care to submit a patch to re-enable the extra compute queues?
>
> Alex
>
> On Mon, May 17, 2021 at 4:09 AM Zhu, Changfeng <Changfeng.Zhu at amd.com 
> <mailto:Changfeng.Zhu at amd.com>> wrote:
> >
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > Hi Ray and Alex,
> >
> > I have confirmed it can enable the additional compute queues with 
> this patch:
> >
> > [   41.823013] This is ring mec 1, pipe 0, queue 0, value 1
> > [   41.823028] This is ring mec 1, pipe 1, queue 0, value 1
> > [   41.823042] This is ring mec 1, pipe 2, queue 0, value 1
> > [   41.823057] This is ring mec 1, pipe 3, queue 0, value 1
> > [   41.823071] This is ring mec 1, pipe 0, queue 1, value 1
> > [   41.823086] This is ring mec 1, pipe 1, queue 1, value 1
> > [   41.823101] This is ring mec 1, pipe 2, queue 1, value 1
> > [   41.823115] This is ring mec 1, pipe 3, queue 1, value 1
> >
> > BR,
> > Changfeng.
> >
> >
> > -----Original Message-----
> > From: Huang, Ray <Ray.Huang at amd.com <mailto:Ray.Huang at amd.com>>
> > Sent: Monday, May 17, 2021 2:27 PM
> > To: Alex Deucher <alexdeucher at gmail.com 
> <mailto:alexdeucher at gmail.com>>; Zhu, Changfeng
> > <Changfeng.Zhu at amd.com <mailto:Changfeng.Zhu at amd.com>>
> > Cc: amd-gfx list <amd-gfx at lists.freedesktop.org 
> <mailto:amd-gfx at lists.freedesktop.org>>
> > Subject: Re: [PATCH] drm/amdgpu: disable 3DCGCG on picasso/raven1 to
> > avoid compute hang
> >
> > On Fri, May 14, 2021 at 10:13:55PM +0800, Alex Deucher wrote:
> > > On Fri, May 14, 2021 at 4:20 AM <changfeng.zhu at amd.com 
> <mailto:changfeng.zhu at amd.com>> wrote:
> > > >
> > > > From: changzhu <Changfeng.Zhu at amd.com 
> <mailto:Changfeng.Zhu at amd.com>>
> > > >
> > > > From: Changfeng <Changfeng.Zhu at amd.com 
> <mailto:Changfeng.Zhu at amd.com>>
> > > >
> > > > There is problem with 3DCGCG firmware and it will cause compute
> > > > test hang on picasso/raven1. It needs to disable 3DCGCG in driver
> > > > to avoid compute hang.
> > > >
> > > > Change-Id: Ic7d3c7922b2b32f7ac5193d6a4869cbc5b3baa87
> > > > Signed-off-by: Changfeng <Changfeng.Zhu at amd.com 
> <mailto:Changfeng.Zhu at amd.com>>
> > >
> > > Reviewed-by: Alex Deucher <alexander.deucher at amd.com 
> <mailto:alexander.deucher at amd.com>>
> > >
> > > WIth this applied, can we re-enable the additional compute queues?
> > >
> >
> > I think so.
> >
> > Changfeng, could you please confirm this on all raven series?
> >
> > Patch is Reviewed-by: Huang Rui <ray.huang at amd.com 
> <mailto:ray.huang at amd.com>>
> >
> > > Alex
> > >
> > > > ---
> > > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 10 +++++++---
> > > > drivers/gpu/drm/amd/amdgpu/soc15.c    |  2 --
> > > >  2 files changed, 7 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > index 22608c45f07c..feaa5e4a5538 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > @@ -4947,7 +4947,7 @@ static void 
> gfx_v9_0_update_3d_clock_gating(struct amdgpu_device *adev,
> > > > amdgpu_gfx_rlc_enter_safe_mode(adev);
> > > >
> > > >         /* Enable 3D CGCG/CGLS */
> > > > -       if (enable && (adev->cg_flags & 
> AMD_CG_SUPPORT_GFX_3D_CGCG)) {
> > > > +       if (enable) {
> > > >                 /* write cmd to clear cgcg/cgls ov */
> > > >                 def = data = RREG32_SOC15(GC, 0, 
> mmRLC_CGTT_MGCG_OVERRIDE);
> > > >                 /* unset CGCG override */ @@ -4959,8 +4959,12 @@
> > > > static void gfx_v9_0_update_3d_clock_gating(struct amdgpu_device 
> *adev,
> > > >                 /* enable 3Dcgcg FSM(0x0000363f) */
> > > >                 def = RREG32_SOC15(GC, 0,
> > > > mmRLC_CGCG_CGLS_CTRL_3D);
> > > >
> > > > -               data = (0x36 << 
> RLC_CGCG_CGLS_CTRL_3D__CGCG_GFX_IDLE_THRESHOLD__SHIFT) |
> > > > - RLC_CGCG_CGLS_CTRL_3D__CGCG_EN_MASK;
> > > > +               if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGCG)
> > > > +                       data = (0x36 << 
> RLC_CGCG_CGLS_CTRL_3D__CGCG_GFX_IDLE_THRESHOLD__SHIFT) |
> > > > + RLC_CGCG_CGLS_CTRL_3D__CGCG_EN_MASK;
> > > > +               else
> > > > +                       data = 0x0 <<
> > > > + RLC_CGCG_CGLS_CTRL_3D__CGCG_GFX_IDLE_THRESHOLD__SHIFT;
> > > > +
> > > >                 if (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGLS)
> > > >                         data |= (0x000F << 
> RLC_CGCG_CGLS_CTRL_3D__CGLS_REP_COMPANSAT_DELAY__SHIFT) |
> > > >
> > > > RLC_CGCG_CGLS_CTRL_3D__CGLS_EN_MASK;
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > > index 4b660b2d1c22..080e715799d4 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > > @@ -1393,7 +1393,6 @@ static int soc15_common_early_init(void 
> *handle)
> > > > adev->cg_flags = AMD_CG_SUPPORT_GFX_MGCG |
> > > > AMD_CG_SUPPORT_GFX_MGLS |
> > > > AMD_CG_SUPPORT_GFX_CP_LS |
> > > > - AMD_CG_SUPPORT_GFX_3D_CGCG |
> > > > AMD_CG_SUPPORT_GFX_3D_CGLS |
> > > > AMD_CG_SUPPORT_GFX_CGCG |
> > > > AMD_CG_SUPPORT_GFX_CGLS | @@
> > > > -1413,7
> > > > +1412,6 @@ static int soc15_common_early_init(void *handle)
> > > > AMD_CG_SUPPORT_GFX_MGLS |
> > > > AMD_CG_SUPPORT_GFX_RLC_LS |
> > > > AMD_CG_SUPPORT_GFX_CP_LS |
> > > > - AMD_CG_SUPPORT_GFX_3D_CGCG |
> > > > AMD_CG_SUPPORT_GFX_3D_CGLS |
> > > > AMD_CG_SUPPORT_GFX_CGCG |
> > > > AMD_CG_SUPPORT_GFX_CGLS |
> > > > --
> > > > 2.17.1
> > > >
> > > > _______________________________________________
> > > > amd-gfx mailing list
> > > > amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
> > > > 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F>
> > > > li
> > > > sts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C0
> > > > 1%
> > > > 7CRay.Huang%40amd.com%7C0e273856253d4b3efd0b08d916e2892a%7C3dd8961
> > > > fe
> > > > 4884e608e11a82d994e183d%7C0%7C0%7C637565984495414849%7CUnknown%7CT
> > > > WF
> > > > pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV
> > > > CI
> > > > 6Mn0%3D%7C1000&sdata=lBzswAPBguL0mWFglEk%2Bg2eDCEuhir7JfFjov%2
> > > > BV
> > > > 7pSY%3D&reserved=0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Calexander.deucher%40amd.com%7C6d2cfe6e59f54875f6fa08d91a6dd27f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569881259273626%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=33Is2P3sqdabI7PPuHFOmzuvXyFId%2BOTAMyJ8G5PhzI%3D&reserved=0 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cguchun.chen%40amd.com%7C3fc7a549816d4c8061c008d91a719cb8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569897555065647%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YTC%2FvVR%2BbPKw9JKayhmHapRkkEFaczoGzJJ3jFJqBAM%3D&reserved=0>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210519/877e3d95/attachment-0001.htm>


More information about the amd-gfx mailing list