[PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi

Koenig, Christian Christian.Koenig at amd.com
Thu Sep 26 18:44:45 UTC 2019


That was the same bug. What we need to do is to prevent VMID invalidation while the SDMA is using it.

The first part was to disallow concurrent VMID flushes, the second part was doing VMID0 flushes through the SDMA block itself.

Both workarounds where added to avoid corruption, that GFXOFF is hanging without this is certainly a completely different issue.

I suspect that we have similar issue as with Vega/Raven where we need to grab a semaphore to avoid the block from being gated while an invalidation is in progress.

Regards,
Christian.


Am 26.09.2019 20:07 schrieb "Deucher, Alexander" <Alexander.Deucher at amd.com>:

Maybe I'm mixing up issues.  The navi10/14 issue that was fixed on navi12 was fixed in amdgpu_ids.c in

commit a2bd77bbde791202267c25478bbcbe71bb4ecdd5
Author: Christian König <christian.koenig at amd.com>
Date:   Thu Feb 7 12:10:29 2019 +0100

    drm/amdgpu: disable concurrent flushes for Navi10 v2

    Navi10 have a bug in the SDMA which can theoretically cause memory
    corruption with concurrent VMID flushes

    v2: explicitely check Navi10

    Signed-off-by: Christian König <christian.koenig at amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>

This is a different issue and may apply to all navi parts.  so maybe the patch is fine as is.

Alex

________________________________
From: Koenig, Christian <Christian.Koenig at amd.com>
Sent: Thursday, September 26, 2019 2:02 PM
To: Yuan, Xiaojie <Xiaojie.Yuan at amd.com>
Cc: Alex Deucher <alexdeucher at gmail.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi

Well then you didn't figured out the root cause correctly.

This is to avoid data corruption with the SDMA on Navi 10/14 and should definitely not applied to Navi 12.

The hardware team went through quite some work to avoid this.

Christian.

Am 26.09.2019 19:38 schrieb "Yuan, Xiaojie" <Xiaojie.Yuan at amd.com>:
 Hi Alex / Christian,

When gfxoff is enabled for Navi12, I observed sdma0 hang while launching desktop. When this workaround is applied, the issue fades away.
That's why I included this workaround for Navi12 as well.

BR,
Xiaojie
________________________________
From: Koenig, Christian <Christian.Koenig at amd.com>
Sent: Thursday, September 26, 2019 10:20 PM
To: Alex Deucher <alexdeucher at gmail.com>
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Yuan, Xiaojie <Xiaojie.Yuan at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi



Am 26.09.2019 15:51 schrieb Alex Deucher <alexdeucher at gmail.com>:
On Thu, Sep 26, 2019 at 9:47 AM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Am 26.09.19 um 15:40 schrieb Alex Deucher:
> > On Thu, Sep 26, 2019 at 8:29 AM Christian König
> > <ckoenig.leichtzumerken at gmail.com> wrote:
> >> Stop, wait a second guys!
> >>
> >> This will disable the workaround for Navi10 and that is certainly not correct:
> >>
> >> !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= CHIP_NAVI12)
> >>
> > Actually, I think it's correct. When I merged the baco patch, I
> > accidentally dropped the navi checks.  E.g.,
> > @@ -245,8 +245,9 @@ static void gmc_v10_0_flush_gpu_tlb(struct
> > amdgpu_device *adev,
> >          mutex_lock(&adev->mman.gtt_window_lock);
> >
> >          gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_MMHUB, 0);
> > -       if (!adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready ||
> > -           adev->asic_type != CHIP_NAVI10) {
> > +       if (!adev->mman.buffer_funcs_enabled ||
> > +           !adev->ib_pool_ready ||
> > +           adev->in_gpu_reset) {
> >                  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB, 0);
> >                  mutex_unlock(&adev->mman.gtt_window_lock);
> >                  return;
> > I think it should have been
> > adev->asic_type != CHIP_NAVI10 && adev->asic_type != CHIP_NAVI14 &&
> > adev->asic_type != CHIP_NAVI12
>
> My last status is that Navi12 is not supposed to need that workaround,
> that's why we checked Navi10 and later Navi14 separately.
>
> It's possible that I miss-read the !(adev->asic_type >= CHIP_NAVI10 &&
> adev->asic_type <= CHIP_NAVI12) check here, but either way that looks to
> complicated to me.
>
> We should rather mention every affected asic type separately here.

Good point.  navi12 should be dropped from the check.  How about the following?

I would rather test explicitly for Navi 10 and 14, cause we can't be sure if there won't be another variant in the future.

Christian.


diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 241a4e57cf4a..280bbd7ca8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -292,7 +292,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct
amdgpu_device *adev, uint32_t vmid,

        if (!adev->mman.buffer_funcs_enabled ||
            !adev->ib_pool_ready ||
-           adev->in_gpu_reset) {
+           adev->in_gpu_reset ||
+           (adev->asic_type == CHIP_NAVI12)) {
                gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
                mutex_unlock(&adev->mman.gtt_window_lock);
                return;

Alex

>
> Regards,
> Christian.
>
> >
> > Alex
> >
> >> Christian.
> >>
> >>
> >> Am 26.09.19 um 14:26 schrieb Deucher, Alexander:
> >>
> >> Please add a patch description.  With that fixed:
> >> Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
> >> ________________________________
> >> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of Yuan, Xiaojie <Xiaojie.Yuan at amd.com>
> >> Sent: Thursday, September 26, 2019 4:09 AM
> >> To: amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> >> Cc: alexdeucher at gmail.com <alexdeucher at gmail.com>
> >> Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi
> >>
> >> Hi Alex,
> >>
> >> This patch is to add the asic_type check which is missing after drm-next branch rebase.
> >>
> >> BR,
> >> Xiaojie
> >> ________________________________
> >> From: Yuan, Xiaojie <Xiaojie.Yuan at amd.com>
> >> Sent: Thursday, September 26, 2019 4:08 PM
> >> To: amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> >> Cc: alexdeucher at gmail.com <alexdeucher at gmail.com>; Yuan, Xiaojie <Xiaojie.Yuan at amd.com>
> >> Subject: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi
> >>
> >> Fixes: 767acabdac81 ("drm/amd/powerplay: add baco smu reset function for smu11")
> >> Signed-off-by: Xiaojie Yuan <xiaojie.yuan at amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> index cb3f61873baa..dc2e68e019eb 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> @@ -292,6 +292,7 @@ static void gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
> >>
> >>           if (!adev->mman.buffer_funcs_enabled ||
> >>               !adev->ib_pool_ready ||
> >> +           !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= CHIP_NAVI12) ||
> >>               adev->in_gpu_reset) {
> >>                   gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
> >>                   mutex_unlock(&adev->mman.gtt_window_lock);
> >> --
> >> 2.20.1
> >>
> >>
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >>
> >>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190926/8b61f963/attachment-0001.html>


More information about the amd-gfx mailing list