[PATCH] drm/amdgpu: workaround for TLB seq race
Philip Yang
yangp at amd.com
Thu Nov 3 21:18:05 UTC 2022
On 2022-11-02 10:58, Christian König wrote:
> It can happen that we query the sequence value before the callback
> had a chance to run.
>
> Work around that by grabbing the fence lock and releasing it again.
> Should be replaced by hw handling soon.
kfd_flush_tlb is always called after waiting for map/unmap to GPU fence
signalled, that means the callback is already executed and the sequence
is increased if tlb flush is needed, so no such race from KFD.
I am not sure but seems the race does exist for amdgpu to grab vm and
schedule job.
Acked-by: Philip Yang <Philip.Yang at amd.com>
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 9ecb7f663e19..e51a46c9582b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -485,6 +485,21 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
> */
> static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
> {
> + unsigned long flags;
> + spinlock_t *lock;
> +
> + /*
> + * Work around to stop racing between the fence signaling and handling
> + * the cb. The lock is static after initially setting it up, just make
> + * sure that the dma_fence structure isn't freed up.
> + */
> + rcu_read_lock();
> + lock = vm->last_tlb_flush->lock;
> + rcu_read_unlock();
> +
> + spin_lock_irqsave(lock, flags);
> + spin_unlock_irqrestore(lock, flags);
> +
> return atomic64_read(&vm->tlb_seq);
> }
>
More information about the amd-gfx
mailing list