[PATCH 05/10] drm/amdgpu: reset hw jobs when gpu reset
Christian König
deathsimple at vodafone.de
Thu Jun 30 07:49:13 UTC 2016
Am 30.06.2016 um 09:09 schrieb Chunming Zhou:
> Change-Id: If673e1708b6207d70a26f64067dc1b0b24e868e7
> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 47 +++++++++++++-----------------
> 1 file changed, 20 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5c4691c..60b6dd0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1951,8 +1951,10 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev)
> continue;
>
> kthread_park(ring->sched.thread);
> + amd_sched_hw_job_reset(&ring->sched);
> }
> -
> + /* after all hw jobs are reset, hw fence is meanless, so force_completion */
> + amdgpu_fence_driver_force_completion(adev);
> /* block TTM */
> resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
Unrelated to this change, but I just noticed that we should probably
block TTM before parking the scheduler.
Otherwise we could end up with this call waiting for the TTM workqueue
and the TTM workqueue waiting for the scheduler which is already blocked.
> /* store modesetting */
> @@ -1994,33 +1996,24 @@ retry:
> }
> /* restore scratch */
> amdgpu_atombios_scratch_regs_restore(adev);
> - if (0) {
> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> - struct amdgpu_ring *ring = adev->rings[i];
> - if (!ring)
> - continue;
> - kthread_unpark(ring->sched.thread);
> - amdgpu_ring_restore(ring, ring_sizes[i], ring_data[i]);
> - ring_sizes[i] = 0;
> - ring_data[i] = NULL;
> - }
>
> - r = amdgpu_ib_ring_tests(adev);
> - if (r) {
> - dev_err(adev->dev, "ib ring test failed (%d).\n", r);
> - if (saved) {
> - saved = false;
> - r = amdgpu_suspend(adev);
> - goto retry;
> - }
> - }
> - } else {
> - amdgpu_fence_driver_force_completion(adev);
> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> - if (adev->rings[i]) {
> - kthread_unpark(adev->rings[i]->sched.thread);
> - kfree(ring_data[i]);
> - }
> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> + struct amdgpu_ring *ring = adev->rings[i];
> + if (!ring)
> + continue;
> + amdgpu_ring_restore(ring, ring_sizes[i], ring_data[i]);
> + kthread_unpark(ring->sched.thread);
> + ring_sizes[i] = 0;
> + ring_data[i] = NULL;
> + }
> +
> + r = amdgpu_ib_ring_tests(adev);
> + if (r) {
> + dev_err(adev->dev, "ib ring test failed (%d).\n", r);
> + if (saved) {
> + saved = false;
> + r = amdgpu_suspend(adev);
> + goto retry;
Is it intentional that this enabled the ring backup again?
Additional to that we should probably still react gracefully to a failed
GPU reset.
Regards,
Christian.
> }
> }
>
More information about the amd-gfx
mailing list