<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Arial;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - General]<br>
</p>
<br>
<div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Please feel free to use:</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com></div>
<div id="appendonsend"></div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Regards</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);" class="elementToProof">
Shashank</div>
<hr style="display: inline-block; width: 98%;">
<div dir="ltr" id="divRplyFwdMsg"><span style="font-family: Calibri, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);"><b>From:</b> Christian König <ckoenig.leichtzumerken@gmail.com><br>
<b>Sent:</b> Monday, February 26, 2024 3:45 PM<br>
<b>To:</b> Sharma, Shashank <Shashank.Sharma@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
<b>Cc:</b> Koenig, Christian <Christian.Koenig@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@amd.com><br>
<b>Subject:</b> Re: [PATCH v3 1/3] drm/amdgpu: replace TLB seq callback with HW seq</span>
<div> </div>
</div>
<div><span style="font-size: 11pt;">Am 23.02.24 um 14:42 schrieb Shashank Sharma:<br>
> From: Christian König <christian.koenig@amd.com><br>
><br>
> The callback we installed for the SDMA update were actually pretty<br>
> horrible. since we now have seq64 use that one and HW seq writes<br>
> instead.<br>
><br>
> V2:(Shashank)<br>
>   - rebased on amd-drm-staging-next<br>
>   - changed amdgpu_seq64_gpu_addr<br>
><br>
> Cc: Christian König <christian.koenig@amd.com><br>
> Cc: Alex Deucher <alexander.deucher@amd.com><br>
> Cc: Felix Kuehling <felix.kuehling@amd.com><br>
> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com><br>
> Signed-off-by: Christian König <christian.koenig@amd.com><br>
<br>
Shashank can I get an rb on this patch here so that I can push it to<br>
amd-staging-drm-next?<br>
<br>
The patch was basically just to double check if the seq64 code works as<br>
intended.<br>
<br>
Thanks,<br>
Christian.<br>
<br>
> ---<br>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c   | 14 ++++<br>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h   |  1 +<br>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 79 ++++-----------------<br>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      | 27 ++-----<br>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  3 +-<br>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |  2 +-<br>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  5 ++<br>
>   7 files changed, 42 insertions(+), 89 deletions(-)<br>
><br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c<br>
> index 3d0d56087d41..300dc79fa2b9 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c<br>
> @@ -199,6 +199,20 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)<br>
>                __clear_bit(bit_pos, adev->seq64.used);<br>
>   }<br>
>  <br>
> +/**<br>
> + * amdgpu_seq64_gpu_addr - Calculate GPU addr from va<br>
> + *<br>
> + * @adev: amdgpu_device pointer<br>
> + * @va: virtual address in client address space<br>
> + *<br>
> + * Calculate the GART address for a VA.<br>
> + */<br>
> +u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va)<br>
> +{<br>
> +     return va - amdgpu_seq64_get_va_base(adev) +<br>
> +             amdgpu_bo_gpu_offset(adev->seq64.sbo);<br>
> +}<br>
> +<br>
>   /**<br>
>    * amdgpu_seq64_fini - Cleanup seq64 driver<br>
>    *<br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h<br>
> index 4203b2ab318d..63e8ac0a2057 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h<br>
> @@ -43,6 +43,7 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 gpu_addr);<br>
>   int amdgpu_seq64_map(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>                     struct amdgpu_bo_va **bo_va);<br>
>   void amdgpu_seq64_unmap(struct amdgpu_device *adev, struct amdgpu_fpriv *fpriv);<br>
> +u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va);<br>
>  <br>
>   #endif<br>
>  <br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c<br>
> index ed4a8c5d26d7..0960e0a665d3 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c<br>
> @@ -111,21 +111,6 @@ struct amdgpu_prt_cb {<br>
>        struct dma_fence_cb cb;<br>
>   };<br>
>  <br>
> -/**<br>
> - * struct amdgpu_vm_tlb_seq_struct - Helper to increment the TLB flush sequence<br>
> - */<br>
> -struct amdgpu_vm_tlb_seq_struct {<br>
> -     /**<br>
> -      * @vm: pointer to the amdgpu_vm structure to set the fence sequence on<br>
> -      */<br>
> -     struct amdgpu_vm *vm;<br>
> -<br>
> -     /**<br>
> -      * @cb: callback<br>
> -      */<br>
> -     struct dma_fence_cb cb;<br>
> -};<br>
> -<br>
>   /**<br>
>    * amdgpu_vm_set_pasid - manage pasid and vm ptr mapping<br>
>    *<br>
> @@ -862,23 +847,6 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,<br>
>        return r;<br>
>   }<br>
>  <br>
> -/**<br>
> - * amdgpu_vm_tlb_seq_cb - make sure to increment tlb sequence<br>
> - * @fence: unused<br>
> - * @cb: the callback structure<br>
> - *<br>
> - * Increments the tlb sequence to make sure that future CS execute a VM flush.<br>
> - */<br>
> -static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,<br>
> -                              struct dma_fence_cb *cb)<br>
> -{<br>
> -     struct amdgpu_vm_tlb_seq_struct *tlb_cb;<br>
> -<br>
> -     tlb_cb = container_of(cb, typeof(*tlb_cb), cb);<br>
> -     atomic64_inc(&tlb_cb->vm->tlb_seq);<br>
> -     kfree(tlb_cb);<br>
> -}<br>
> -<br>
>   /**<br>
>    * amdgpu_vm_update_range - update a range in the vm page table<br>
>    *<br>
> @@ -911,7 +879,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>                           struct dma_fence **fence)<br>
>   {<br>
>        struct amdgpu_vm_update_params params;<br>
> -     struct amdgpu_vm_tlb_seq_struct *tlb_cb;<br>
>        struct amdgpu_res_cursor cursor;<br>
>        enum amdgpu_sync_mode sync_mode;<br>
>        int r, idx;<br>
> @@ -919,12 +886,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>        if (!drm_dev_enter(adev_to_drm(adev), &idx))<br>
>                return -ENODEV;<br>
>  <br>
> -     tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);<br>
> -     if (!tlb_cb) {<br>
> -             r = -ENOMEM;<br>
> -             goto error_unlock;<br>
> -     }<br>
> -<br>
>        /* Vega20+XGMI where PTEs get inadvertently cached in L2 texture cache,<br>
>         * heavy-weight flush TLB unconditionally.<br>
>         */<br>
> @@ -942,6 +903,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>        params.immediate = immediate;<br>
>        params.pages_addr = pages_addr;<br>
>        params.unlocked = unlocked;<br>
> +     params.needs_flush = flush_tlb;<br>
>        params.allow_override = allow_override;<br>
>  <br>
>        /* Implicitly sync to command submissions in the same VM before<br>
> @@ -955,7 +917,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>        amdgpu_vm_eviction_lock(vm);<br>
>        if (vm->evicting) {<br>
>                r = -EBUSY;<br>
> -             goto error_free;<br>
> +             goto error_unlock;<br>
>        }<br>
>  <br>
>        if (!unlocked && !dma_fence_is_signaled(vm->last_unlocked)) {<br>
> @@ -968,7 +930,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>  <br>
>        r = vm->update_funcs->prepare(&params, resv, sync_mode);<br>
>        if (r)<br>
> -             goto error_free;<br>
> +             goto error_unlock;<br>
>  <br>
>        amdgpu_res_first(pages_addr ? NULL : res, offset,<br>
>                         (last - start + 1) * AMDGPU_GPU_PAGE_SIZE, &cursor);<br>
> @@ -1018,7 +980,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>                tmp = start + num_entries;<br>
>                r = amdgpu_vm_ptes_update(&params, start, tmp, addr, flags);<br>
>                if (r)<br>
> -                     goto error_free;<br>
> +                     goto error_unlock;<br>
>  <br>
>                amdgpu_res_next(&cursor, num_entries * AMDGPU_GPU_PAGE_SIZE);<br>
>                start = tmp;<br>
> @@ -1026,22 +988,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>  <br>
>        r = vm->update_funcs->commit(&params, fence);<br>
>  <br>
> -     if (flush_tlb || params.table_freed) {<br>
> -             tlb_cb->vm = vm;<br>
> -             if (fence && *fence &&<br>
> -                 !dma_fence_add_callback(*fence, &tlb_cb->cb,<br>
> -                                        amdgpu_vm_tlb_seq_cb)) {<br>
> -                     dma_fence_put(vm->last_tlb_flush);<br>
> -                     vm->last_tlb_flush = dma_fence_get(*fence);<br>
> -             } else {<br>
> -                     amdgpu_vm_tlb_seq_cb(NULL, &tlb_cb->cb);<br>
> -             }<br>
> -             tlb_cb = NULL;<br>
> -     }<br>
> -<br>
> -error_free:<br>
> -     kfree(tlb_cb);<br>
> -<br>
>   error_unlock:<br>
>        amdgpu_vm_eviction_unlock(vm);<br>
>        drm_dev_exit(idx);<br>
> @@ -2260,10 +2206,14 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>        INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work);<br>
>        INIT_KFIFO(vm->faults);<br>
>  <br>
> -     r = amdgpu_vm_init_entities(adev, vm);<br>
> +     r = amdgpu_seq64_alloc(adev, &vm->tlb_seq_va, &vm->tlb_seq_cpu_addr);<br>
>        if (r)<br>
>                return r;<br>
>  <br>
> +     r = amdgpu_vm_init_entities(adev, vm);<br>
> +     if (r)<br>
> +             goto error_free_seq;<br>
> +<br>
>        vm->pte_support_ats = false;<br>
>        vm->is_compute_context = false;<br>
>  <br>
> @@ -2283,7 +2233,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>  <br>
>        vm->last_update = dma_fence_get_stub();<br>
>        vm->last_unlocked = dma_fence_get_stub();<br>
> -     vm->last_tlb_flush = dma_fence_get_stub();<br>
>        vm->generation = 0;<br>
>  <br>
>        mutex_init(&vm->eviction_lock);<br>
> @@ -2322,10 +2271,11 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,<br>
>        amdgpu_bo_unref(&root_bo);<br>
>  <br>
>   error_free_delayed:<br>
> -     dma_fence_put(vm->last_tlb_flush);<br>
>        dma_fence_put(vm->last_unlocked);<br>
>        amdgpu_vm_fini_entities(vm);<br>
>  <br>
> +error_free_seq:<br>
> +     amdgpu_seq64_free(adev, vm->tlb_seq_va);<br>
>        return r;<br>
>   }<br>
>  <br>
> @@ -2441,7 +2391,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)<br>
>        struct amdgpu_bo_va_mapping *mapping, *tmp;<br>
>        bool prt_fini_needed = !!adev->gmc.gmc_funcs->set_prt;<br>
>        struct amdgpu_bo *root;<br>
> -     unsigned long flags;<br>
>        int i;<br>
>  <br>
>        amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);<br>
> @@ -2453,11 +2402,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)<br>
>        amdgpu_vm_set_pasid(adev, vm, 0);<br>
>        dma_fence_wait(vm->last_unlocked, false);<br>
>        dma_fence_put(vm->last_unlocked);<br>
> -     dma_fence_wait(vm->last_tlb_flush, false);<br>
> -     /* Make sure that all fence callbacks have completed */<br>
> -     spin_lock_irqsave(vm->last_tlb_flush->lock, flags);<br>
> -     spin_unlock_irqrestore(vm->last_tlb_flush->lock, flags);<br>
> -     dma_fence_put(vm->last_tlb_flush);<br>
> +     amdgpu_seq64_free(adev, vm->tlb_seq_va);<br>
>  <br>
>        list_for_each_entry_safe(mapping, tmp, &vm->freed, list) {<br>
>                if (mapping->flags & AMDGPU_PTE_PRT && prt_fini_needed) {<br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h<br>
> index 666698a57192..ac9380afcb69 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h<br>
> @@ -247,9 +247,9 @@ struct amdgpu_vm_update_params {<br>
>        unsigned int num_dw_left;<br>
>  <br>
>        /**<br>
> -      * @table_freed: return true if page table is freed when updating<br>
> +      * @needs_flush: true whenever we need to invalidate the TLB<br>
>         */<br>
> -     bool table_freed;<br>
> +     bool needs_flush;<br>
>  <br>
>        /**<br>
>         * @allow_override: true for memory that is not uncached: allows MTYPE<br>
> @@ -328,9 +328,11 @@ struct amdgpu_vm {<br>
>        struct drm_sched_entity immediate;<br>
>        struct drm_sched_entity delayed;<br>
>  <br>
> -     /* Last finished delayed update */<br>
> +     /* Sequence number indicating necessary TLB flush */<br>
>        atomic64_t              tlb_seq;<br>
> -     struct dma_fence        *last_tlb_flush;<br>
> +     uint64_t                tlb_seq_va;<br>
> +     uint64_t                *tlb_seq_cpu_addr;<br>
> +<br>
>        atomic64_t              kfd_last_flushed_seq;<br>
>  <br>
>        /* How many times we had to re-generate the page tables */<br>
> @@ -549,22 +551,7 @@ int amdgpu_vm_pt_map_tables(struct amdgpu_device *adev, struct amdgpu_vm *vm);<br>
>    */<br>
>   static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)<br>
>   {<br>
> -     unsigned long flags;<br>
> -     spinlock_t *lock;<br>
> -<br>
> -     /*<br>
> -      * Workaround to stop racing between the fence signaling and handling<br>
> -      * the cb. The lock is static after initially setting it up, just make<br>
> -      * sure that the dma_fence structure isn't freed up.<br>
> -      */<br>
> -     rcu_read_lock();<br>
> -     lock = vm->last_tlb_flush->lock;<br>
> -     rcu_read_unlock();<br>
> -<br>
> -     spin_lock_irqsave(lock, flags);<br>
> -     spin_unlock_irqrestore(lock, flags);<br>
> -<br>
> -     return atomic64_read(&vm->tlb_seq);<br>
> +     return READ_ONCE(*vm->tlb_seq_cpu_addr);<br>
>   }<br>
>  <br>
>   /*<br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c<br>
> index 6e31621452de..0f8fd0acef7b 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c<br>
> @@ -108,7 +108,8 @@ static int amdgpu_vm_cpu_update(struct amdgpu_vm_update_params *p,<br>
>   static int amdgpu_vm_cpu_commit(struct amdgpu_vm_update_params *p,<br>
>                                struct dma_fence **fence)<br>
>   {<br>
> -     /* Flush HDP */<br>
> +     if (p->needs_flush)<br>
> +             *p->vm->tlb_seq_cpu_addr = atomic64_inc_return(&p->vm->tlb_seq);<br>
>        mb();<br>
>        amdgpu_device_flush_hdp(p->adev, NULL);<br>
>        return 0;<br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c<br>
> index a160265ddc07..95dc0afdaffb 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c<br>
> @@ -1056,7 +1056,7 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,<br>
>                        while (cursor.pfn < frag_start) {<br>
>                                /* Make sure previous mapping is freed */<br>
>                                if (cursor.entry->bo) {<br>
> -                                     params->table_freed = true;<br>
> +                                     params->needs_flush = true;<br>
>                                        amdgpu_vm_pt_free_dfs(adev, params->vm,<br>
>                                                              &cursor,<br>
>                                                              params->unlocked);<br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c<br>
> index 349416e176a1..91cc3121fd4b 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c<br>
> @@ -126,6 +126,11 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,<br>
>  <br>
>        WARN_ON(ib->length_dw == 0);<br>
>        amdgpu_ring_pad_ib(ring, ib);<br>
> +     if (p->needs_flush) {<br>
> +             p->job->uf_addr = amdgpu_seq64_gpu_addr(p->adev,<br>
> +                                                     p->vm->tlb_seq_va);<br>
> +             p->job->uf_sequence = atomic64_inc_return(&p->vm->tlb_seq);<br>
> +     }<br>
>        WARN_ON(ib->length_dw > p->num_dw_left);<br>
>        f = amdgpu_job_submit(p->job);<br>
>  <br>
<br>
</span></div>
</div>
</body>
</html>