[PATCH 5/5] drm/amdgpu: reduce reset time
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Mon Jul 25 21:17:43 UTC 2022
On 2022-07-22 03:34, Victor Zhao wrote:
> In multi container use case, reset time is important, so skip ring
> tests and cp halt wait during ip suspending for reset as they are
> going to fail and cost more time on reset
Why are they failing in this case ? Skipping ring tests is not the best
idea as you
loose important indicator of system's sanity. Is there any way to make
them work ?
>
> Signed-off-by: Victor Zhao <Victor.Zhao at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +++++++++++++++----------
> 2 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 222d3d7ea076..f872495ccc3a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -477,7 +477,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev)
> kiq->pmf->kiq_unmap_queues(kiq_ring, &adev->gfx.compute_ring[i],
> RESET_QUEUES, 0, 0);
>
> - if (adev->gfx.kiq.ring.sched.ready)
> + if (adev->gfx.kiq.ring.sched.ready && !amdgpu_in_reset(adev))
> r = amdgpu_ring_test_helper(kiq_ring);
> spin_unlock(&adev->gfx.kiq.ring_lock);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index fafbad3cf08d..9ae29023e38f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -5971,16 +5971,19 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable)
> WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp);
> }
>
> - for (i = 0; i < adev->usec_timeout; i++) {
> - if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
> - break;
> - udelay(1);
> - }
> -
> - if (i >= adev->usec_timeout)
> - DRM_ERROR("failed to %s cp gfx\n", enable ? "unhalt" : "halt");
> + if (!amdgpu_in_reset(adev)) {
> + for (i = 0; i < adev->usec_timeout; i++) {
> + if (RREG32_SOC15(GC, 0, mmCP_STAT) == 0)
> + break;
> + udelay(1);
> + }
>
> + if (i >= adev->usec_timeout)
> + DRM_ERROR("failed to %s cp gfx\n",
> + enable ? "unhalt" : "halt");
> + }
> return 0;
> +
> }
This change has impact beyond container case no ? We had no issue
with this code during regular reset cases so why we would give up on
this code
which confirms CP is idle ? What is the side effect of skipping this
during all GPU resets ?
Andrey
>
> static int gfx_v10_0_cp_gfx_load_pfp_microcode(struct amdgpu_device *adev)
> @@ -7569,8 +7572,10 @@ static int gfx_v10_0_kiq_disable_kgq(struct amdgpu_device *adev)
> for (i = 0; i < adev->gfx.num_gfx_rings; i++)
> kiq->pmf->kiq_unmap_queues(kiq_ring, &adev->gfx.gfx_ring[i],
> PREEMPT_QUEUES, 0, 0);
> -
> - return amdgpu_ring_test_helper(kiq_ring);
> + if (!amdgpu_in_reset(adev))
> + return amdgpu_ring_test_helper(kiq_ring);
> + else
> + return 0;
> }
> #endif
>
> @@ -7610,6 +7615,7 @@ static int gfx_v10_0_hw_fini(void *handle)
>
> return 0;
> }
> +
> gfx_v10_0_cp_enable(adev, false);
> gfx_v10_0_enable_gui_idle_interrupt(adev, false);
>
More information about the amd-gfx
mailing list