[PATCH 07/12] drm/amdgpu/sriov:implement strict gpu reset

Nicolai Hähnle nhaehnle at gmail.com
Mon Oct 9 10:58:32 UTC 2017


On 30.09.2017 08:03, Monk Liu wrote:
> changes:
> 1)implement strict mode sriov gpu reset
> 2)always call sriov_gpu_reset_strict if hypervisor notify FLR
> 3)in strict reset mode, set error to all fences.
> 4)change fence_wait/cs_wait functions to return -ENODEV if fence signaled
> with error == -ETIME,
> 
> Since after strict gpu reset we consider the VRAM were lost,
> and since assuming VRAM lost there is little help to recover
> shadow BO because all textures/resources/shaders cannot
> recovered (if they resident in VRAM)
> 
> Change-Id: I50d9b8b5185ba92f137f07c9deeac19d740d753b
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
[snip]
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 9efbb33..122e2e1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2734,6 +2734,96 @@ static int amdgpu_recover_vram_from_shadow(struct amdgpu_device *adev,
>   }
>   
>   /**
> + * amdgpu_sriov_gpu_reset_strict - reset the asic under strict mode
> + *
> + * @adev: amdgpu device pointer
> + * @job: which job trigger hang
> + *
> + * Attempt the reset the GPU if it has hung (all asics).
> + * for SRIOV case.
> + * Returns 0 for success or an error on failure.
> + *
> + * this function will deny all process/fence created before this reset,
> + * and drop all jobs unfinished during this reset.
> + *
> + * Application should take the responsibility to re-open the FD to re-create
> + * the VM page table and recover all resources as well

Total NAK to this. It is *completely* infeasible from the UMD side, 
because multiple drivers can simultaneously use the same FD.

The KMD should just drop all previously submitted jobs and let the UMD 
worry about whether it wants to re-use buffer objects or not.

The VM page table can then be rebuilt transparently based on whatever BO 
lists are used as new submissions are made after the reset.

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the amd-gfx mailing list