[PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

Deucher, Alexander Alexander.Deucher at amd.com
Mon Nov 30 17:22:05 UTC 2020


[AMD Public Use]

We need to figure out what the root cause is then.  If we can't figure it out soon, we should revert the change for navi1x and continue to debug it until we can find the root cause and we can safely re-enable it.

Alex
________________________________
From: Chen, Guchun <Guchun.Chen at amd.com>
Sent: Sunday, November 29, 2020 2:22 AM
To: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>; Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: Gui, Jack <Jack.Gui at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx mailing list <amd-gfx at lists.freedesktop.org>; Huang, Ray <Ray.Huang at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: RE: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

[AMD Public Use]

Hi Bas Nieuwenhuizen,

I don't think direct revert is one right approach, though it's able to fix your problem.  noretry=0 will cause other test failure on several ASICs.

Regards,
Guchun

-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Bas Nieuwenhuizen
Sent: Sunday, November 29, 2020 8:38 AM
To: Kuehling, Felix <Felix.Kuehling at amd.com>
Cc: Gui, Jack <Jack.Gui at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx mailing list <amd-gfx at lists.freedesktop.org>; Huang, Ray <Ray.Huang at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: Re: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

Can we revert this patch to fix
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1374&data=04%7C01%7Cguchun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Jxa2V1TuszoBKtF%2FPbIA3YwOrXHgLreBY%2FXej1HTZ4k%3D&reserved=0 ?

On Thu, Oct 15, 2020 at 4:30 PM Felix Kuehling <felix.kuehling at amd.com> wrote:
>
> Am 2020-10-14 um 11:35 p.m. schrieb Chengming Gui:
> > noretry = 0 cause some dGPU's kfd page fault tests fail, so set
> > noretry to 1 for these special ASICs:
> > vega20/navi10/navi14/ARCTURUS
> >
> > v2: merge raven and default case due to the same setting
> > v3: remove ARCTURUS
> >
> > Signed-off-by: Chengming Gui <Jack.Gui at amd.com>
> > Change-Id: I3be70f463a49b0cd5c56456431d6c2cb98b13872
>
> Acked-by: Felix Kuhling <Felix.Kuehling at amd.com>
>
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23
> > +++++++++++++++--------
> >  1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > index 36604d751d62..f26eb4e54b12 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -425,20 +425,27 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
> >       struct amdgpu_gmc *gmc = &adev->gmc;
> >
> >       switch (adev->asic_type) {
> > -     case CHIP_RAVEN:
> > -             /* Raven currently has issues with noretry
> > -              * regardless of what we decide for other
> > -              * asics, we should leave raven with
> > -              * noretry = 0 until we root cause the
> > -              * issues.
> > +     case CHIP_VEGA20:
> > +     case CHIP_NAVI10:
> > +     case CHIP_NAVI14:
> > +             /*
> > +              * noretry = 0 will cause kfd page fault tests fail
> > +              * for some ASICs, so set default to 1 for these ASICs.
> >                */
> >               if (amdgpu_noretry == -1)
> > -                     gmc->noretry = 0;
> > +                     gmc->noretry = 1;
> >               else
> >                       gmc->noretry = amdgpu_noretry;
> >               break;
> > +     case CHIP_RAVEN:
> >       default:
> > -             /* default this to 0 for now, but we may want
> > +             /* Raven currently has issues with noretry
> > +              * regardless of what we decide for other
> > +              * asics, we should leave raven with
> > +              * noretry = 0 until we root cause the
> > +              * issues.
> > +              *
> > +              * default this to 0 for now, but we may want
> >                * to change this in the future for certain
> >                * GPUs as it can increase performance in
> >                * certain cases.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cgu
> chun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884
> e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C1000&sdata=VFqegGwPCj10q3Y5BdZsVq2a%2B4Tb358mYVDaNkA9zLU%3D&
> reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cguchun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VFqegGwPCj10q3Y5BdZsVq2a%2B4Tb358mYVDaNkA9zLU%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20201130/72b37434/attachment-0001.htm>


More information about the amd-gfx mailing list