[Patch V3] drm/amdgpu: Increase tlb flush timeout for sriov

Alex Deucher alexdeucher at gmail.com
Thu Aug 11 14:33:46 UTC 2022


On Thu, Aug 11, 2022 at 10:16 AM Liu, Shaoyun <Shaoyun.Liu at amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> From HW point of view , the  maximum VF number can reach 16  instead  of 12 . Although currently no product will use the 16 VFs  together,  not sure about the future.
> You can added Acked-by me.  I will let Alex & Christion decide whether accept this change.
>

I'll let you two sort out the timeout, but the rest of the patch looks
good to me.
Acked-by: Alex Deucher <alexander.deucher at amd.com>

> Regards
> Shaoyun.liu
>
>
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Dusica Milinkovic
> Sent: Thursday, August 11, 2022 6:01 AM
> To: amd-gfx at lists.freedesktop.org
> Cc: Milinkovic, Dusica <Dusica.Milinkovic at amd.com>
> Subject: [Patch V3] drm/amdgpu: Increase tlb flush timeout for sriov
>
> [Why]
> During multi-vf executing benchmark (Luxmark) observed kiq error timeout.
> It happenes because all of VFs do the tlb invalidation at the same time.
> Although each VF has the invalidate register set, from hardware side the invalidate requests are queue to execute.
>
> [How]
> In case of 12 VF increase timeout on 12*100ms
>
> Signed-off-by: Dusica Milinkovic <Dusica.Milinkovic at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h    | 2 +-
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 ++-  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 3 ++-
>  3 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 5a639c857bd0..79bb6fd83094 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -320,7 +320,7 @@ enum amdgpu_kiq_irq {
>         AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0,
>         AMDGPU_CP_KIQ_IRQ_LAST
>  };
> -
> +#define SRIOV_USEC_TIMEOUT  1200000 /* wait 12 * 100ms for SRIOV */
>  #define MAX_KIQ_REG_WAIT       5000 /* in usecs, 5ms */
>  #define MAX_KIQ_REG_BAILOUT_INTERVAL   5 /* in msecs, 5ms */
>  #define MAX_KIQ_REG_TRY 1000
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 9ae8cdaa033e..f513e2c2e964 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -419,6 +419,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>         uint32_t seq;
>         uint16_t queried_pasid;
>         bool ret;
> +       u32 usec_timeout = amdgpu_sriov_vf(adev) ? SRIOV_USEC_TIMEOUT :
> +adev->usec_timeout;
>         struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
>         struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>
> @@ -437,7 +438,7 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>
>                 amdgpu_ring_commit(ring);
>                 spin_unlock(&adev->gfx.kiq.ring_lock);
> -               r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
> +               r = amdgpu_fence_wait_polling(ring, seq, usec_timeout);
>                 if (r < 1) {
>                         dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r);
>                         return -ETIME;
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index ab89d91975ab..4603653916f5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -896,6 +896,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>         uint32_t seq;
>         uint16_t queried_pasid;
>         bool ret;
> +       u32 usec_timeout = amdgpu_sriov_vf(adev) ? SRIOV_USEC_TIMEOUT :
> +adev->usec_timeout;
>         struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
>         struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>
> @@ -935,7 +936,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
>
>                 amdgpu_ring_commit(ring);
>                 spin_unlock(&adev->gfx.kiq.ring_lock);
> -               r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
> +               r = amdgpu_fence_wait_polling(ring, seq, usec_timeout);
>                 if (r < 1) {
>                         dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r);
>                         up_read(&adev->reset_domain->sem);
> --
> 2.25.1
>


More information about the amd-gfx mailing list