[PATCH 09/12] drm/amdgpu: recovery hw jobs when gpu reset
Christian König
deathsimple at vodafone.de
Fri Jul 1 09:30:57 UTC 2016
Am 30.06.2016 um 11:34 schrieb Chunming Zhou:
> Change-Id: If10da1e224d81a12fd4f8d760c48178adb9e82d0
> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a3ca83f..0759c23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2002,8 +2002,9 @@ retry:
> struct amdgpu_ring *ring = adev->rings[i];
> if (!ring)
> continue;
> + amd_sched_job_recovery(&ring->sched);
> kthread_unpark(ring->sched.thread);
> - amdgpu_ring_restore(ring, ring_sizes[i], ring_data[i]);
> + kfree(ring_data[i]);
> ring_sizes[i] = 0;
> ring_data[i] = NULL;
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index cced2f6..7393473 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -384,11 +384,11 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring,
> amdgpu_ring_emit_pipeline_sync(ring);
>
> if (ring->funcs->emit_vm_flush &&
> - pd_addr != AMDGPU_VM_NO_FLUSH) {
> + (pd_addr != AMDGPU_VM_NO_FLUSH || amdgpu_vm_is_gpu_reset(adev, id))) {
> struct fence *fence;
>
> trace_amdgpu_vm_flush(pd_addr, ring->idx, vm_id);
> - amdgpu_ring_emit_vm_flush(ring, vm_id, pd_addr);
> + amdgpu_ring_emit_vm_flush(ring, vm_id, id->pd_gpu_addr);
NAK, we need to handle this differently. The problem is the
id->pd_gpu_addr could already be reseted when you have more than one
submission to the same engine.
E.g. submission A1 uses VMID 1 and PD address A and submissing B1 uses
VMID1 as well but PD address B. When we do it like this we would use PD
address B for both submissions on restart.
I suggest to just drop the AMDGPU_VM_NO_FLUSH special value and use a
boolean to signal that a flush is needed instead.
Regards,
Christian.
>
> r = amdgpu_fence_emit(ring, &fence);
> if (r)
More information about the amd-gfx
mailing list