[PATCH 6/6] drm/amdgpu: use more than 64KB fragment size if possible

Tue Aug 9 16:45:19 UTC 2016

On 2016-08-09 11:35, Christian König wrote:
> Am 09.08.2016 um 17:49 schrieb Jay Cornwall:
>> On 2016-08-09 07:52, Christian König wrote:
>>> From: Christian König <christian.koenig at amd.com>
>>> 
>>> We align to 64KB, but when userspace aligns even more we can easily 
>>> use more.
>>> 
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index e6c030b..88f4109 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -817,13 +817,13 @@ static void amdgpu_vm_frag_ptes(struct
>>> amdgpu_pte_update_params    *params,
>>>       * allocation size to the fragment size.
>>>       */
>>> 
>>> -    /* SI and newer are optimized for 64KB */
>>> -    uint64_t frag_flags = 
>>> AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);
>>> -    uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
>>> +    const uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
>>> 
>>>      uint64_t frag_start = ALIGN(start, frag_align);
>>>      uint64_t frag_end = end & ~(frag_align - 1);
>>> 
>>> +    uint32_t frag;
>>> +
>>>      /* system pages are non continuously */
>>>      if (params->src || params->pages_addr || !(flags & 
>>> AMDGPU_PTE_VALID) ||
>>>          (frag_start >= frag_end)) {
>>> @@ -832,6 +832,10 @@ static void amdgpu_vm_frag_ptes(struct
>>> amdgpu_pte_update_params    *params,
>>>          return;
>>>      }
>>> 
>>> +    /* use more than 64KB fragment size if possible */
>>> +    frag = lower_32_bits(frag_start | frag_end);
>>> +    frag = likely(frag) ? __ffs(frag) : 31;
>>> +
>>>      /* handle the 4K area at the beginning */
>>>      if (start != frag_start) {
>>>          amdgpu_vm_update_ptes(params, vm, start, frag_start,
>>> @@ -841,7 +845,7 @@ static void amdgpu_vm_frag_ptes(struct
>>> amdgpu_pte_update_params    *params,
>>> 
>>>      /* handle the area in the middle */
>>>      amdgpu_vm_update_ptes(params, vm, frag_start, frag_end, dst,
>>> -                  flags | frag_flags);
>>> +                  flags | AMDGPU_PTE_FRAG(frag));
>>> 
>>>      /* handle the 4K area at the end */
>>>      if (frag_end != end) {
>> 
>> Would this change not direct larger fragments away from the BigK TLB 
>> partition?
>> 
>> My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an 
>> exact match and not a minimum size. I can't find any immediate 
>> documentation on that topic to confirm.
> 
> Yeah I was questioning that myself as well, especially since you wrote
> in the initial patch that SI and later are optimized for 64K.

The 64K figure came from VM documentation. It was otherwise unqualified 
but my guess is it was based on VidMM's page size (64K), the average 
gaming work set size, and the number of BigK entries. Apparently it's 
still good as we haven't changed it since.

> So I tested it on Tonga and Polaris10 and it seems to work as
> expected, e.g. a 1MB fragment size really results in not reading the
> other page table entries as soon as it is cached.
> 
> But I'm not sure how exactly this partitioning of the L2 works and
> what effect it should have.

OK. As long as there's no regression on e.g. Heaven, where I benchmarked 
the original change at + several percent, then it should be fine.

-- 
Jay Cornwall