[PATCH] drm/amd: Add a workaround for GFX11 systems that fail to flush TLB

Thu Dec 14 20:04:00 UTC 2023

On 2023-12-14 10:06, Alex Deucher wrote:
> On Thu, Dec 14, 2023 at 9:24 AM Liu, Shaoyun <Shaoyun.Liu at amd.com> wrote:
>> [AMD Official Use Only - General]
>>
>> The gmc flush tlb function is used on both  baremetal and sriov.   But the  function  amdgpu_virt_kiq_reg_write_reg_wait is defined in amdgpu_virt.c with  name  'virt'  make it appear as a SRIOV only function, this sounds confusion . Will it make more sense to move the function out of amdgpu_virt.c file and  rename it as amdgpu_kig_reg_write_reg_wait ?
>>
>> Another thing I'm not sure is inside amdgpu_virt_kiq_reg_write_reg_wait , has below logic :
>>          if (adev->mes.ring.sched.ready) {
>>                  amdgpu_mes_reg_write_reg_wait(adev, reg0, reg1,
>>                                                ref, mask);
>>                  return;
>>          }
>> On MES enabled situation , it will always call to mes queue to do the register write and  wait .  Shouldn't this OP been directly send to kiq itself ?  The ring for kiq and  mes is different ,  driver should use kiq(adev->gfx.kiq[0].ring) for these register read/write or wait operation  and  mes ( adev->mes.ring) for add/remove queues  etc.
>>
> I understand why it is needed for SR-IOV.  Is there a reason to use
> the MES or KIQ for TLB invalidation rather than the register method on
> bare metal?  It looks like the register method is never used anymore.
>   Seems like we should either, make the KIQ/MES method SR-IOV only, or
> drop the register method and just always use KIQ/MES.

It can be useful as a fallback or workaround for FW issues. We also use 
it when running without HWS (amdgpu.sched_policy=2). This is not the 
production code path, but can be useful for triaging on GPUs without MES.

Regards,
   Felix

>
> Alex
>
>
>> Regards
>> Shaoyun.liu
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Christian König
>> Sent: Thursday, December 14, 2023 4:22 AM
>> To: Alex Deucher <alexdeucher at gmail.com>; Limonciello, Mario <Mario.Limonciello at amd.com>
>> Cc: Huang, Tim <Tim.Huang at amd.com>; amd-gfx at lists.freedesktop.org; Koenig, Christian <Christian.Koenig at amd.com>; stable at vger.kernel.org
>> Subject: Re: [PATCH] drm/amd: Add a workaround for GFX11 systems that fail to flush TLB
>>
>> Am 13.12.23 um 20:44 schrieb Alex Deucher:
>>> On Wed, Dec 13, 2023 at 2:32 PM Mario Limonciello
>>> <mario.limonciello at amd.com> wrote:
>>>> On 12/13/2023 13:12, Mario Limonciello wrote:
>>>>> On 12/13/2023 13:07, Alex Deucher wrote:
>>>>>> On Wed, Dec 13, 2023 at 1:00 PM Mario Limonciello
>>>>>> <mario.limonciello at amd.com> wrote:
>>>>>>> Some systems with MP1 13.0.4 or 13.0.11 have a firmware bug that
>>>>>>> causes the first MES packet after resume to fail. This packet is
>>>>>>> used to flush the TLB when GART is enabled.
>>>>>>>
>>>>>>> This issue is fixed in newer firmware, but as OEMs may not roll
>>>>>>> this out to the field, introduce a workaround that will retry the
>>>>>>> flush when detecting running on an older firmware and decrease
>>>>>>> relevant error messages to debug while workaround is in use.
>>>>>>>
>>>>>>> Cc: stable at vger.kernel.org # 6.1+
>>>>>>> Cc: Tim Huang <Tim.Huang at amd.com>
>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3045
>>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 10 ++++++++--
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  2 ++
>>>>>>>     drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 17 ++++++++++++++++-
>>>>>>>     drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  |  8 ++++++--
>>>>>>>     4 files changed, 32 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>>>>>>> index 9ddbf1494326..6ce3f6e6b6de 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>>>>>>> @@ -836,8 +836,14 @@ int amdgpu_mes_reg_write_reg_wait(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>            }
>>>>>>>
>>>>>>>            r = adev->mes.funcs->misc_op(&adev->mes, &op_input);
>>>>>>> -       if (r)
>>>>>>> -               DRM_ERROR("failed to reg_write_reg_wait\n");
>>>>>>> +       if (r) {
>>>>>>> +               const char *msg = "failed to
>>>>>>> + reg_write_reg_wait\n";
>>>>>>> +
>>>>>>> +               if (adev->mes.suspend_workaround)
>>>>>>> +                       DRM_DEBUG(msg);
>>>>>>> +               else
>>>>>>> +                       DRM_ERROR(msg);
>>>>>>> +       }
>>>>>>>
>>>>>>>     error:
>>>>>>>            return r;
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
>>>>>>> index a27b424ffe00..90f2bba3b12b 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
>>>>>>> @@ -135,6 +135,8 @@ struct amdgpu_mes {
>>>>>>>
>>>>>>>            /* ip specific functions */
>>>>>>>            const struct amdgpu_mes_funcs   *funcs;
>>>>>>> +
>>>>>>> +       bool                            suspend_workaround;
>>>>>>>     };
>>>>>>>
>>>>>>>     struct amdgpu_mes_process {
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>>> index 23d7b548d13f..e810c7bb3156 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>>> @@ -889,7 +889,11 @@ static int gmc_v11_0_gart_enable(struct
>>>>>>> amdgpu_device *adev)
>>>>>>>                    false : true;
>>>>>>>
>>>>>>>            adev->mmhub.funcs->set_fault_enable_default(adev, value);
>>>>>>> -       gmc_v11_0_flush_gpu_tlb(adev, 0, AMDGPU_MMHUB0(0), 0);
>>>>>>> +
>>>>>>> +       do {
>>>>>>> +               gmc_v11_0_flush_gpu_tlb(adev, 0, AMDGPU_MMHUB0(0), 0);
>>>>>>> +               adev->mes.suspend_workaround = false;
>>>>>>> +       } while (adev->mes.suspend_workaround);
>>>>>> Shouldn't this be something like:
>>>>>>
>>>>>>> +       do {
>>>>>>> +               gmc_v11_0_flush_gpu_tlb(adev, 0, AMDGPU_MMHUB0(0), 0);
>>>>>>> +               adev->mes.suspend_workaround = false;
>>>>>>> +               gmc_v11_0_flush_gpu_tlb(adev, 0, AMDGPU_MMHUB0(0), 0);
>>>>>>> +       } while (adev->mes.suspend_workaround);
>>>>>> If we actually need the flush.  Maybe a better approach would be to
>>>>>> check if we are in s0ix in
>>>>> Ah you're right; I had shifted this around to keep less stateful
>>>>> variables and push them up the stack from when I first made it and
>>>>> that logic is wrong now.
>>>>>
>>>>> I don't think the one you suggested is right either; it's going to
>>>>> apply twice on ASICs that only need it once.
>>>>>
>>>>> I guess pending on what Christian comments on below I'll respin to
>>>>> logic that only calls twice on resume for these ASICs.
>>>> One more comment.  Tim and I both did an experiment for this
>>>> (skipping the flush) separately.  The problem isn't the flush itself,
>>>> rather it's the first MES packet after exiting GFXOFF.
>> Well that's an ugly one. Can that happen every time GFXOFF kicks in?
>>
>>>> So it seems that it pushes off the issue to the next thing which is a
>>>> ring buffer test:
>>>>
>>>> [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on
>>>> comp_1.0.0 (-110).
>>>> [drm:process_one_work] *ERROR* ib ring test failed (-110).
>>>>
>>>> So maybe a better workaround is a "dummy" command that is only sent
>>>> for the broken firmware that we don't care about the outcome and discard errors.
>>>>
>>>> Then the workaround doesn't need to get as entangled with correct flow.
>>> Yeah. Something like that seems cleaner.  Just a question of where to
>>> put it since we skip GC and MES for s0ix.  Probably somewhere in
>>> gmc_v11_0_resume() before gmc_v11_0_gart_enable().  Maybe add a new
>>> mes callback.
>> Please try to keep it completely outside of the TLB invalidation and VM flush handling.
>>
>> Regards,
>> Christian.
>>
>>> Alex
>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c in gmc_v11_0_flush_gpu_tlb():
>>>>>> index 23d7b548d13f..bd6d9953a80e 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> @@ -227,7 +227,8 @@ static void gmc_v11_0_flush_gpu_tlb(struct
>>>>>> amdgpu_device *adev, uint32_t vmid,
>>>>>>             * Directly use kiq to do the vm invalidation instead
>>>>>>             */
>>>>>>            if ((adev->gfx.kiq[0].ring.sched.ready ||
>>>>>> adev->mes.ring.sched.ready) &&
>>>>>> -           (amdgpu_sriov_runtime(adev) || !amdgpu_sriov_vf(adev))) {
>>>>>> +           (amdgpu_sriov_runtime(adev) || !amdgpu_sriov_vf(adev)) ||
>>>>>> +           !adev->in_s0ix) {
>>>>>>                    amdgpu_virt_kiq_reg_write_reg_wait(adev, req,
>>>>>> ack, inv_req,
>>>>>>                                    1 << vmid, GET_INST(GC, 0));
>>>>>>                    return;
>>>>>>
>>>>>> @Christian Koenig is this logic correct?
>>>>>>
>>>>>>            /* For SRIOV run time, driver shouldn't access the
>>>>>> register through MMIO
>>>>>>             * Directly use kiq to do the vm invalidation instead
>>>>>>             */
>>>>>>            if ((adev->gfx.kiq[0].ring.sched.ready ||
>>>>>> adev->mes.ring.sched.ready) &&
>>>>>>                (amdgpu_sriov_runtime(adev) || !amdgpu_sriov_vf(adev))) {
>>>>>>                    amdgpu_virt_kiq_reg_write_reg_wait(adev, req,
>>>>>> ack, inv_req,
>>>>>>                                    1 << vmid, GET_INST(GC, 0));
>>>>>>                    return;
>>>>>>            }
>>>>>>
>>>>>> We basically always use the MES with that logic.  If that is the
>>>>>> case, we should just drop the rest of that function.  Shouldn't we
>>>>>> only use KIQ or MES for SR-IOV?  gmc v10 has similar logic which
>>>>>> also seems wrong.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>>            DRM_INFO("PCIE GART of %uM enabled (table at 0x%016llX).\n",
>>>>>>>                     (unsigned int)(adev->gmc.gart_size >> 20), @@
>>>>>>> -960,6 +964,17 @@ static int gmc_v11_0_resume(void *handle)
>>>>>>>            int r;
>>>>>>>            struct amdgpu_device *adev = (struct amdgpu_device
>>>>>>> *)handle;
>>>>>>>
>>>>>>> +       switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
>>>>>>> +       case IP_VERSION(13, 0, 4):
>>>>>>> +       case IP_VERSION(13, 0, 11):
>>>>>>> +               /* avoid problems with first TLB flush after resume */
>>>>>>> +               if ((adev->pm.fw_version & 0x00FFFFFF) < 0x004c4900)
>>>>>>> +                       adev->mes.suspend_workaround = adev->in_s0ix;
>>>>>>> +               break;
>>>>>>> +       default:
>>>>>>> +               break;
>>>>>>> +       }
>>>>>>> +
>>>>>>>            r = gmc_v11_0_hw_init(adev);
>>>>>>>            if (r)
>>>>>>>                    return r;
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
>>>>>>> index 4dfec56e1b7f..84ab8c611e5e 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
>>>>>>> @@ -137,8 +137,12 @@ static int
>>>>>>> mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
>>>>>>>            r = amdgpu_fence_wait_polling(ring, ring->fence_drv.sync_seq,
>>>>>>>                          timeout);
>>>>>>>            if (r < 1) {
>>>>>>> -               DRM_ERROR("MES failed to response msg=%d\n",
>>>>>>> -                         x_pkt->header.opcode);
>>>>>>> +               if (mes->suspend_workaround)
>>>>>>> +                       DRM_DEBUG("MES failed to response msg=%d\n",
>>>>>>> +                                 x_pkt->header.opcode);
>>>>>>> +               else
>>>>>>> +                       DRM_ERROR("MES failed to response msg=%d\n",
>>>>>>> +                                 x_pkt->header.opcode);
>>>>>>>
>>>>>>>                    while (halt_if_hws_hang)
>>>>>>>                            schedule();
>>>>>>> --
>>>>>>> 2.34.1
>>>>>>>