[PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for large BOs

Wed Mar 29 05:58:26 UTC 2017

Hi Felix,

Thanks for your illustration with patience.
I got your meaning then, and thanks to fix that.

 > I think you're right, there are some extra high bits in saddr and eaddr, but
 > they get discarded by the modulo-division in the next recursion level. For
 > added clarity the modulo division (or masking of high bits) could be moved up
 > to the caller.

It may be more clear if modulo-division called before the caller.
Anyway,
Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>

On 03/29/2017 11:24 AM, Kuehling, Felix wrote:
> Hi Jerry,
>
> Let me clarify my understanding of the problem and the intention of the fix.
>
> Address range per page: 4KB
> Address range per PT (level 3): 2MB
> Address range per PD (level 2): 1GB
> Address range per PD (level 1): 512GB
> Address range per PD (level 0): 256TB
>
> Imagine for example a mapping that spans multiple level 2 page directories. Say
> from 0.5GB to 2.5GB.
>
> At level 0:
> from = 0
> to = 0
>
> At level 1:
> from = 0
> to = 2
>
> So for level 2 amdgpu_vm_alloc_levels gets called 3 times (pt_idx = [0..2]).
> Without my change, it gets called with the same saddr and eaddr 3 times. So it
> calculates the same "from" and "to" each time:
> from = 256 % 512 = 256
> to = 767 % 512 = 255
>
> Obviously that doesn't work. What we really want is this (from..to in level 2
> when called for pt_idx values from level 1):
>
> For pt_idx=0 we want to fill the second half of the level 2 page directory with
> page tables:
> from = 256
> to = 511
>
> For pt_idx=1 we need to fill the entire level 2 page directories with page tables:
> from = 0
> to = 511
>
> For pt_idx=2 we want to fill the first half of the level 2 page directory with
> page tables:
> from = 0
> to = 255
>
> This creates a contiguous range of page tables across the three level 2 page
> directories that are spanned by the allocation. My change achieves that.
>
> I think you're right, there are some extra high bits in saddr and eaddr, but
> they get discarded by the modulo-division in the next recursion level. For
> added clarity the modulo division (or masking of high bits) could be moved up
> to the caller.
>
> Regards,
>    Felix
>
>
> --
> F e l i x   K u e h l i n g
> SMTS Software Development Engineer | Vertical Workstation/Compute
> 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
> (O) +1(289)695-1597
>     _     _   _   _____   _____
>    / \   | \ / | |  _  \  \ _  |
>   / A \  | \M/ | | |D) )  /|_| |
> /_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com
> -------------------------------------------------------------------------------
> *From:* Zhang, Jerry
> *Sent:* Tuesday, March 28, 2017 10:54:03 PM
> *To:* Kuehling, Felix; amd-gfx at lists.freedesktop.org; Koenig, Christian; Zhou,
> David(ChunMing)
> *Cc:* Deucher, Alexander; Russell, Kent
> *Subject:* Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for
> large BOs
> On 03/29/2017 09:00 AM, Felix Kuehling wrote:
>> Fix the start/end address calculation for address ranges that span
>> multiple page directories in amdgpu_vm_alloc_levels.
>>
>> Add WARN_ONs if page tables aren't found. Otherwise the page table
>> update would just fail silently.
>>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 818747f..44aa039 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -278,6 +278,8 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
>>        from = (saddr >> shift) % amdgpu_vm_num_entries(adev, level);
>>        to = (eaddr >> shift) % amdgpu_vm_num_entries(adev, level);
>>
>> +     WARN_ON(from > to);
>> +
>>        if (to > parent->last_entry_used)
>>                parent->last_entry_used = to;
>>
>> @@ -312,8 +314,11 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
>>                }
>>
>>                if (level < adev->vm_manager.num_level) {
>> -                     r = amdgpu_vm_alloc_levels(adev, vm, entry, saddr,
>> -                                                eaddr, level);
>> +                     uint64_t sub_saddr = (pt_idx == from) ? saddr : 0;
>> +                     uint64_t sub_eaddr = (pt_idx == to) ? eaddr :
>> +                             ((1 << shift) - 1);
>> +                     r = amdgpu_vm_alloc_levels(adev, vm, entry, sub_saddr,
>> +                                                sub_eaddr, level);
>
> Do you mean to create a full sub-pt if pt_idx != from and != to?
> I didn't catch it up clear.
>
> In my perspective, we may need to clear the saddr and eaddr upper level bit
> before going on to allocate the next level PT.
> Otherwise, the number of next PT entry will be incorrect, more than num_entries
> as the upper level PT number calculated in.
>
> e.g.
> there is a address contains
> PD + PT1 + PT2 + PT3
>
> for the first round, we could get correct "from" and "to" address with right
> "shift"(3 blocks), then walk over the address space in PD range.
>
> Then for the 2nd round, we cannot get the expected "from" and "to" address with
> "shift"(2 blocks), as it walks over the address space in PD + PT1 range.
> While we'd like it's in range PT1 indeed.
>
> The fault way goes on with later round.
>
> Perhaps, that's the root cause about the issue you face.
>
> Jerry.
>
>>                        if (r)
>>                                return r;
>>                }
>> @@ -957,9 +962,11 @@ static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
>>                entry = &entry->entries[idx];
>>        }
>>
>> -     if (level)
>> +     if (WARN_ON(level))
>>                return NULL;
>>
>> +     WARN_ON(!entry->bo);
>> +
>>        return entry->bo;
>>   }
>>
>>