[PATCH v3] drm/amdgpu: reset vm state machine after gpu reset(vram lost)
Christian König
christian.koenig at amd.com
Wed Jul 24 06:46:59 UTC 2024
Am 24.07.24 um 05:00 schrieb ZhenGuo Yin:
> [Why]
> Page table of compute VM in the VRAM will lost after gpu reset.
> VRAM won't be restored since compute VM has no shadows.
>
> [How]
> Use higher 32-bit of vm->generation to record a vram_lost_counter.
> Reset the VM state machine when vm->genertaion is not equal to
> the new generation token.
>
> v2: Check vm->generation instead of calling drm_sched_entity_error
> in amdgpu_vm_validate.
> v3: Use new generation token instead of vram_lost_counter for check.
>
> Signed-off-by: ZhenGuo Yin <zhenguo.yin at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3abfa66d72a2..6c6170f0f318 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -434,7 +434,7 @@ uint64_t amdgpu_vm_generation(struct amdgpu_device *adev, struct amdgpu_vm *vm)
> if (!vm)
> return result;
>
> - result += vm->generation;
> + result += (vm->generation & 0xFFFFFFFFULL);
Please use the lower_32_bits() macro here.
With that fixed the patch is Reviewed-by: Christian König
<christian.koenig at amd.com>
Thanks and sorry that I didn't initially got what the actual problem
here is,
Christian.
> /* Add one if the page tables will be re-generated on next CS */
> if (drm_sched_entity_error(&vm->delayed))
> ++result;
> @@ -463,13 +463,14 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> int (*validate)(void *p, struct amdgpu_bo *bo),
> void *param)
> {
> + uint64_t new_vm_generation = amdgpu_vm_generation(adev, vm);
> struct amdgpu_vm_bo_base *bo_base;
> struct amdgpu_bo *shadow;
> struct amdgpu_bo *bo;
> int r;
>
> - if (drm_sched_entity_error(&vm->delayed)) {
> - ++vm->generation;
> + if (vm->generation != new_vm_generation) {
> + vm->generation = new_vm_generation;
> amdgpu_vm_bo_reset_state_machine(vm);
> amdgpu_vm_fini_entities(vm);
> r = amdgpu_vm_init_entities(adev, vm);
> @@ -2439,7 +2440,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> vm->last_update = dma_fence_get_stub();
> vm->last_unlocked = dma_fence_get_stub();
> vm->last_tlb_flush = dma_fence_get_stub();
> - vm->generation = 0;
> + vm->generation = amdgpu_vm_generation(adev, NULL);
>
> mutex_init(&vm->eviction_lock);
> vm->evicting = false;
More information about the amd-gfx
mailing list