[RFC PATCH 1/2] drm/amdgpu: add direct ib pool

Felix Kuehling felix.kuehling at amd.com
Thu Mar 26 14:36:41 UTC 2020


On 2020-03-26 3:11, Pan, Xinhui wrote:
>
>> 2020年3月26日 14:51,Koenig, Christian <Christian.Koenig at amd.com> 写道:
>>
>>
>>
>> Am 26.03.2020 07:45 schrieb "Pan, Xinhui" <Xinhui.Pan at amd.com>:
>>
>>
>>> 2020年3月26日 14:36,Koenig, Christian <Christian.Koenig at amd.com> 写道:
>>>
>>>
>>>
>>> Am 26.03.2020 07:15 schrieb "Pan, Xinhui" <Xinhui.Pan at amd.com>:
>>>
>>>
>>>> 2020年3月26日 13:38,Koenig, Christian <Christian.Koenig at amd.com> 写道:
>>>>
>>>> Yeah that's on my TODO list for quite a while as well.
>>>>
>>>> But we even need three IB pools. One very small for the IB tests, one for direct VM updates and one for the rest.
>>>>
>>>> So please make the pool a parameter to ib_get() and not the hack you have below.
>>> yep, I will make IB pool  a parameter.
>>>
>>> IB tests for gfx need many IBs, PAGE_SIZE for ib pool is still not enough.
>>> but the default size for ib pool is 2MB now, just one hugepage, today we have memory in TB.
>>> so no need make a different size for IB tests pool.
>>>
>>> 2MB is probably a bit much and we don't have huge page optimisation for kernel allocations at the moment anyway. Keep in mind that we have only limited space in the GART.
>> gart table is just 512MB.
>> do you mean every entry in gart table just points to one 4KB page? and need 5 gart table entries for one 2M hugepage?
>>
>> Yes, we need 512 * 4KB entries for a 2MB page in GART. The table for the system VM is flat because of hardware restrictions.
>>
> oh yes, 512 entries.
>
>> IIRC we tried 256MB for the GART initially and in general we try to keep that as small as possible because it eats up visible VRAM space.
> that is true. but our roadmap tells that we are a linux ML team.
> I can not imagine that customers use small pci bar servers. well, they care the cost. but I prefer they using new products. :)
>
> anyway, I will chosse a smallest workable default vaule for now.

For Linux ML we want a small GART size. We used to waste lots of memory 
for a big GART that was unnecessary and took away valuable memory from 
ML applications. The GART is only used for kernel mode mappings, most of 
which tend to be short-lived. Statically allocating a big GART is not 
necessary.

The IB pools are statically allocated and long-lived. So if we need to 
increase the GART size to meet that demand, so be it. But we should not 
waste memory for IBs and associated GART space without a good reason.

Regards,
   Felix


>
>> Christian.
>>
>>
>>> Maybe make this 4*PAGE_SIZE for the new IB pool for now and test if that works or not.
>>>
>>> Christian.
>>>
>>>
>>>
>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 26.03.2020 03:02 schrieb "Pan, Xinhui" <Xinhui.Pan at amd.com>:
>>>> Another ib poll for direct submit.
>>>> Any jobs schedule IBs without dependence on gpu scheduler should use
>>>> this pool firstly.
>>>>
>>>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++++++++++--
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++++++-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
>>>>   5 files changed, 21 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 7dd74253e7b6..c01423ffb8ed 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -849,6 +849,7 @@ struct amdgpu_device {
>>>>           struct amdgpu_ring              *rings[AMDGPU_MAX_RINGS];
>>>>           bool                            ib_pool_ready;
>>>>           struct amdgpu_sa_manager        ring_tmp_bo;
>>>> +       struct amdgpu_sa_manager        ring_tmp_bo_direct;
>>>>   
>>>>           /* interrupts */
>>>>           struct amdgpu_irq               irq;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 8304d0c87899..28be4efb3d5b 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
>>>>                   parser->entity = entity;
>>>>   
>>>>                   ring = to_amdgpu_ring(entity->rq->sched);
>>>> -               r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
>>>> +               r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, ring->funcs->parse_cs ?
>>>>                                      chunk_ib->ib_bytes : 0, ib);
>>>>                   if (r) {
>>>>                           DRM_ERROR("Failed to get ib !\n");
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>> index bece01f1cf09..f2e08c372d57 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>> @@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>>           int r;
>>>>   
>>>>           if (size) {
>>>> -               r = amdgpu_sa_bo_new(&adev->ring_tmp_bo,
>>>> +               r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : &adev->ring_tmp_bo_direct,
>>>>                                         &ib->sa_bo, size, 256);
>>>>                   if (r) {
>>>>                           dev_err(adev->dev, "failed to get a new IB (%d)\n", r);
>>>> @@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>>   
>>>>                   ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo);
>>>>   
>>>> -               if (!vm)
>>>> +               if (!((unsigned long)vm & ~0x1))
>>>>                           ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo);
>>>>           }
>>>>   
>>>> @@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev)
>>>>                   return r;
>>>>           }
>>>>   
>>>> +       r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct,
>>>> +                                     AMDGPU_IB_POOL_SIZE*64*1024,
>>>> +                                     AMDGPU_GPU_PAGE_SIZE,
>>>> +                                     AMDGPU_GEM_DOMAIN_GTT);
>>>> +       if (r) {
>>>> +               return r;
>>>> +       }
>>>>           adev->ib_pool_ready = true;
>>>>   
>>>>           return 0;
>>>> @@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev)
>>>>   {
>>>>           if (adev->ib_pool_ready) {
>>>>                   amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo);
>>>> +               amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct);
>>>>                   adev->ib_pool_ready = false;
>>>>           }
>>>>   }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index 4981e443a884..6a63826c6760 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>>>>   
>>>>   int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>>>>                                struct amdgpu_job **job)
>>>> +{
>>>> +       return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0);
>>>> +}
>>>> +
>>>> +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
>>>> +                            struct amdgpu_job **job, int direct)
>>>>   {
>>>>           int r;
>>>>   
>>>> @@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>>>>           if (r)
>>>>                   return r;
>>>>   
>>>> -       r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
>>>> +       r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]);
>>>>           if (r)
>>>>                   kfree(*job);
>>>>   
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>>> index 2e2110dddb76..be9dd72b9912 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>>> @@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>>>>                        struct amdgpu_job **job, struct amdgpu_vm *vm);
>>>>   int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>>>>                                struct amdgpu_job **job);
>>>> -
>>>> +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
>>>> +                            struct amdgpu_job **job, int direct);
>>>>   void amdgpu_job_free_resources(struct amdgpu_job *job);
>>>>   void amdgpu_job_free(struct amdgpu_job *job);
>>>>   int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>> -- 
>>>> 2.17.1
>>>
>>


More information about the amd-gfx mailing list