[PATCH 19/22] drm/amdgpu: adjust timeout for ib_ring_tests

Mon Feb 26 15:55:50 UTC 2018

On Mon, Feb 26, 2018 at 12:18 AM, Monk Liu <Monk.Liu at amd.com> wrote:
> issue:
> sometime GFX/MM ib test hit timeout under SRIOV env, root cause
> is that engine doesn't come back soon enough so the current
> IB test considered as timed out.
>
> fix:
> for SRIOV GFX IB test wait time need to be expanded a lot during
> SRIOV runtimei mode since it couldn't really begin before GFX engine
> come back.
>
> for SRIOV MM IB test it always need more time since MM scheduling
> is not go together with GFX engine, it is controled by h/w MM
> scheduler so no matter runtime or exclusive mode MM IB test
> always need more time.
>
> Change-Id: I0342371bc073656476ad850e1f5d9a021846dc8c
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 4709d13..d6776286 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -316,14 +316,42 @@ int amdgpu_ib_ring_tests(struct amdgpu_device *adev)
>  {
>         unsigned i;
>         int r, ret = 0;
> +       long tmo_gfx, tmo_mm;
> +
> +       tmo_mm = tmo_gfx = AMDGPU_IB_TEST_TIMEOUT;
> +       if (amdgpu_sriov_vf(adev)) {
> +               /* for MM engines in hypervisor side they are not scheduled together
> +                * with CP and SDMA engines, so even in exclusive mode MM engine could
> +                * still running on other VF thus the IB TEST TIMEOUT for MM engines
> +                * under SR-IOV should be set to a long time.
> +                */
> +               tmo_mm = 8 * AMDGPU_IB_TEST_TIMEOUT; /* 8 sec should be enough for the MM comes back to this VF */
> +       }
> +
> +       if (amdgpu_sriov_runtime(adev)) {
> +               /* for CP & SDMA engines since they are scheduled together so
> +                * need to make the timeout width enough to cover the time
> +                * cost waiting for it coming back under RUNTIME only
> +               */
> +               tmo_gfx = 8 * AMDGPU_IB_TEST_TIMEOUT;
> +       }
> +
> +       adev->accel_working = true;

This change seems unrelated.

>
>         for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>                 struct amdgpu_ring *ring = adev->rings[i];
> +               long tmo;
>
>                 if (!ring || !ring->ready)
>                         continue;
>
> -               r = amdgpu_ring_test_ib(ring, AMDGPU_IB_TEST_TIMEOUT);
> +               /* MM engine need more time */
> +               if (ring->idx > 11)

Please check ring type here rather than the idx since the idx may vary
based on the number of IPs on the SOC.

Alex

> +                       tmo = tmo_mm;
> +               else
> +                       tmo = tmo_gfx;
> +
> +               r = amdgpu_ring_test_ib(ring, tmo);
>                 if (r) {
>                         ring->ready = false;
>
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx