[PATCH 2/2] drm/amdgpu: bind GTT on demand

Wed Sep 7 16:07:34 UTC 2016

Yeah, you save yourself the GTT update, which always uses the CPU.
Instead you write the page table entries to IBs using the CPU. Will be
interesting to see if that's a net speed-up or slow-down, if the
difference is significant at all.

The patch looks good to me. Thanks for tackling this. I didn't review it
too thoroughly, and if you forgot to map to GART somewhere important, I
would probably have missed it.

Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>

Once we pull this change into the KFD branch, we can probably revert our
ridiculous GART size without losing the ability to map lots of system
memory to GPUVM. :)

Regards,
  Felix

On 16-09-07 04:18 AM, Christian König wrote:
> Am 07.09.2016 um 09:18 schrieb zhoucm1:
>>
>>
>> On 2016年09月06日 18:48, Christian König wrote:
>>> Am 06.09.2016 um 11:53 schrieb zhoucm1:
>>>>
>>>>
>>>> On 2016年09月06日 17:41, Christian König wrote:
>>>>> From: Christian König <christian.koenig at amd.com>
>>>>>
>>>>> We don't really need the GTT table any more most of the time.
>>>> Why? I thought GTT bo is always needed to be bound when GPU is
>>>> trying to access it, doesn't it?
>>>
>>> We only need it to be bound when we try to access it from the system
>>> VM (UVD/VCE, ring buffers, fences etc...).
>>>
>>> The VMs for each process can work without it and it is just overhead
>>> to bind it all the time.
>> Yes, I got your mean, copying pte from GART to VM confused me, I
>> thought binding GTT table is a must, that's wrong.
>> Go though vm code again, pte can be created by amdgpu_vm_map_gart if
>> no GTT pte source when we don't bind it at all under VM process, am I
>> right now?
>
> Yes exactly. I still need to measure the performance impact of that,
> adding the copy from the GTT made quite a difference when I introduced
> that.
>
> I hope that doing the write when the update the VM is still faster
> than doing the GTT write.
>
> Regards,
> Christian.
>
>>
>> Regards,
>> David Zhou
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>>
>>>> Regards,
>>>> David Zhou
>>>>>   So bind it
>>>>> only on demand.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 34
>>>>> ++++++++++++++++++++++++++++--
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  7 ++++++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 33
>>>>> ++++++++++++++++++++++++++---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h    |  3 +++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c    |  4 ++++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c    |  6 +++++-
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  3 ++-
>>>>>   8 files changed, 84 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> index 9d39fa8..28d4a67 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> @@ -2521,6 +2521,7 @@ static inline void amdgpu_acpi_fini(struct
>>>>> amdgpu_device *adev) { }
>>>>>   struct amdgpu_bo_va_mapping *
>>>>>   amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser,
>>>>>                  uint64_t addr, struct amdgpu_bo **bo);
>>>>> +int amdgpu_cs_sysvm_access_required(struct amdgpu_cs_parser
>>>>> *parser);
>>>>>     #if defined(CONFIG_DRM_AMD_DAL)
>>>>>   int amdgpu_dm_display_resume(struct amdgpu_device *adev );
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> index 20a1962..e069978 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -640,8 +640,12 @@ static int amdgpu_cs_parser_bos(struct
>>>>> amdgpu_cs_parser *p,
>>>>>           }
>>>>>       }
>>>>>   -    if (p->uf_entry.robj)
>>>>> -        p->job->uf_addr += amdgpu_bo_gpu_offset(p->uf_entry.robj);
>>>>> +    if (!r && p->uf_entry.robj) {
>>>>> +        struct amdgpu_bo *uf = p->uf_entry.robj;
>>>>> +
>>>>> +        r = amdgpu_ttm_bind(uf->tbo.ttm, &uf->tbo.mem);
>>>>> +        p->job->uf_addr += amdgpu_bo_gpu_offset(uf);
>>>>> +    }
>>>>>     error_validate:
>>>>>       if (r) {
>>>>> @@ -1156,3 +1160,29 @@ amdgpu_cs_find_mapping(struct
>>>>> amdgpu_cs_parser *parser,
>>>>>         return NULL;
>>>>>   }
>>>>> +
>>>>> +/**
>>>>> + * amdgpu_cs_sysvm_access_required - make BOs accessible by the
>>>>> system VM
>>>>> + *
>>>>> + * @parser: command submission parser context
>>>>> + *
>>>>> + * Helper for UVD/VCE VM emulation, make sure BOs are accessible
>>>>> by the system VM.
>>>>> + */
>>>>> +int amdgpu_cs_sysvm_access_required(struct amdgpu_cs_parser *parser)
>>>>> +{
>>>>> +    unsigned i;
>>>>> +    int r;
>>>>> +
>>>>> +    if (!parser->bo_list)
>>>>> +        return 0;
>>>>> +
>>>>> +    for (i = 0; i < parser->bo_list->num_entries; i++) {
>>>>> +        struct amdgpu_bo *bo = parser->bo_list->array[i].robj;
>>>>> +
>>>>> +        r = amdgpu_ttm_bind(bo->tbo.ttm, &bo->tbo.mem);
>>>>> +        if (unlikely(r))
>>>>> +            return r;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> index b17734e..dc73f11 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> @@ -646,6 +646,11 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo
>>>>> *bo, u32 domain,
>>>>>           dev_err(bo->adev->dev, "%p pin failed\n", bo);
>>>>>           goto error;
>>>>>       }
>>>>> +    r = amdgpu_ttm_bind(bo->tbo.ttm, &bo->tbo.mem);
>>>>> +    if (unlikely(r)) {
>>>>> +        dev_err(bo->adev->dev, "%p bind failed\n", bo);
>>>>> +        goto error;
>>>>> +    }
>>>>>         bo->pin_count = 1;
>>>>>       if (gpu_addr != NULL)
>>>>> @@ -918,6 +923,8 @@ void amdgpu_bo_fence(struct amdgpu_bo *bo,
>>>>> struct fence *fence,
>>>>>   u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
>>>>>   {
>>>>>       WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_SYSTEM);
>>>>> +    WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_TT &&
>>>>> +             !amdgpu_ttm_is_bound(bo->tbo.ttm));
>>>>> WARN_ON_ONCE(!ww_mutex_is_locked(&bo->tbo.resv->lock) &&
>>>>>                !bo->pin_count);
>>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index 4337b3a..c294aa9 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -256,8 +256,12 @@ static int amdgpu_move_blit(struct
>>>>> ttm_buffer_object *bo,
>>>>>       new_start = (u64)new_mem->start << PAGE_SHIFT;
>>>>>         switch (old_mem->mem_type) {
>>>>> -    case TTM_PL_VRAM:
>>>>>       case TTM_PL_TT:
>>>>> +        r = amdgpu_ttm_bind(bo->ttm, old_mem);
>>>>> +        if (r)
>>>>> +            return r;
>>>>> +
>>>>> +    case TTM_PL_VRAM:
>>>>>           old_start += bo->bdev->man[old_mem->mem_type].gpu_offset;
>>>>>           break;
>>>>>       default:
>>>>> @@ -265,8 +269,12 @@ static int amdgpu_move_blit(struct
>>>>> ttm_buffer_object *bo,
>>>>>           return -EINVAL;
>>>>>       }
>>>>>       switch (new_mem->mem_type) {
>>>>> -    case TTM_PL_VRAM:
>>>>>       case TTM_PL_TT:
>>>>> +        r = amdgpu_ttm_bind(bo->ttm, new_mem);
>>>>> +        if (r)
>>>>> +            return r;
>>>>> +
>>>>> +    case TTM_PL_VRAM:
>>>>>           new_start += bo->bdev->man[new_mem->mem_type].gpu_offset;
>>>>>           break;
>>>>>       default:
>>>>> @@ -639,7 +647,6 @@ static int amdgpu_ttm_backend_bind(struct
>>>>> ttm_tt *ttm,
>>>>>                      struct ttm_mem_reg *bo_mem)
>>>>>   {
>>>>>       struct amdgpu_ttm_tt *gtt = (void*)ttm;
>>>>> -    uint32_t flags = amdgpu_ttm_tt_pte_flags(gtt->adev, ttm,
>>>>> bo_mem);
>>>>>       int r;
>>>>>         if (gtt->userptr) {
>>>>> @@ -660,6 +667,26 @@ static int amdgpu_ttm_backend_bind(struct
>>>>> ttm_tt *ttm,
>>>>>           bo_mem->mem_type == AMDGPU_PL_OA)
>>>>>           return -EINVAL;
>>>>>   +    return 0;
>>>>> +}
>>>>> +
>>>>> +bool amdgpu_ttm_is_bound(struct ttm_tt *ttm)
>>>>> +{
>>>>> +    struct amdgpu_ttm_tt *gtt = (void *)ttm;
>>>>> +
>>>>> +    return gtt && !list_empty(&gtt->list);
>>>>> +}
>>>>> +
>>>>> +int amdgpu_ttm_bind(struct ttm_tt *ttm, struct ttm_mem_reg *bo_mem)
>>>>> +{
>>>>> +    struct amdgpu_ttm_tt *gtt = (void *)ttm;
>>>>> +    uint32_t flags;
>>>>> +    int r;
>>>>> +
>>>>> +    if (!ttm || amdgpu_ttm_is_bound(ttm))
>>>>> +        return 0;
>>>>> +
>>>>> +    flags = amdgpu_ttm_tt_pte_flags(gtt->adev, ttm, bo_mem);
>>>>>       r = amdgpu_gart_bind(gtt->adev, gtt->offset, ttm->num_pages,
>>>>>           ttm->pages, gtt->ttm.dma_address, flags);
>>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> index 72f6bfc..214bae9 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>>>> @@ -77,4 +77,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>>>>>               struct fence **fence);
>>>>>     int amdgpu_mmap(struct file *filp, struct vm_area_struct *vma);
>>>>> +bool amdgpu_ttm_is_bound(struct ttm_tt *ttm);
>>>>> +int amdgpu_ttm_bind(struct ttm_tt *ttm, struct ttm_mem_reg *bo_mem);
>>>>> +
>>>>>   #endif
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>>>>> index 5888e8a..fab7bb1 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>>>>> @@ -891,6 +891,10 @@ int amdgpu_uvd_ring_parse_cs(struct
>>>>> amdgpu_cs_parser *parser, uint32_t ib_idx)
>>>>>           return -EINVAL;
>>>>>       }
>>>>>   +    r = amdgpu_cs_sysvm_access_required(parser);
>>>>> +    if (r)
>>>>> +        return r;
>>>>> +
>>>>>       ctx.parser = parser;
>>>>>       ctx.buf_sizes = buf_sizes;
>>>>>       ctx.ib_idx = ib_idx;
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> index 9b71d6c..b0c6702 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
>>>>> @@ -634,7 +634,11 @@ int amdgpu_vce_ring_parse_cs(struct
>>>>> amdgpu_cs_parser *p, uint32_t ib_idx)
>>>>>       uint32_t allocated = 0;
>>>>>       uint32_t tmp, handle = 0;
>>>>>       uint32_t *size = &tmp;
>>>>> -    int i, r = 0, idx = 0;
>>>>> +    int i, r, idx = 0;
>>>>> +
>>>>> +    r = amdgpu_cs_sysvm_access_required(p);
>>>>> +    if (r)
>>>>> +        return r;
>>>>>         while (idx < ib->length_dw) {
>>>>>           uint32_t len = amdgpu_get_ib_value(p, ib_idx, idx);
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> index 7660f82..a549abd 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> @@ -1162,7 +1162,8 @@ int amdgpu_vm_bo_update(struct amdgpu_device
>>>>> *adev,
>>>>>       }
>>>>>         flags = amdgpu_ttm_tt_pte_flags(adev, bo_va->bo->tbo.ttm,
>>>>> mem);
>>>>> -    gtt_flags = (adev == bo_va->bo->adev) ? flags : 0;
>>>>> +    gtt_flags = (amdgpu_ttm_is_bound(bo_va->bo->tbo.ttm) &&
>>>>> +        adev == bo_va->bo->adev) ? flags : 0;
>>>>>         spin_lock(&vm->status_lock);
>>>>>       if (!list_empty(&bo_va->vm_status))
>>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx