[PATCH] drm/amdgpu: Fix multi-level page table bugs for large BOs v3

Felix Kuehling felix.kuehling at amd.com
Wed Mar 29 18:07:55 UTC 2017


On 17-03-29 01:40 PM, Christian König wrote:
> Am 29.03.2017 um 19:22 schrieb Felix Kuehling:
>> Fix the start/end address calculation for address ranges that span
>> multiple page directories in amdgpu_vm_alloc_levels.
>>
>> Add error messages if page tables aren't found. Otherwise the page
>> table update would just fail silently.
>>
>> v2:
>>   * Change WARN_ON to WARN_ON_ONCE
>>   * Move masking of high address bits to caller
>>   * Add range-check for "from" and "to"
>> v3:
>>   * Replace WARN_ON_ONCE in get_pt with pr_err in caller
>>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>
> Mhm, I still don't like that we modify saddr/eaddr here because I
> wanted to resolve the recursion sooner or later.
>
> But anyway that is work for a future patch, this one is Reviewed-by:
> Christian König <christian.koenig at amd.com>.

Thank you, I appreciate it. I realize it's probably not perfect. But I'm
just looking at the intention of the original code and trying to make it
work correctly for KFD so we can continue integrating amdgpu changes
(and important fixes, such as VM fault handling) into the KFD branch.
That's also why I'm a bit in a rush. This was actually blocking KFD.

We take a different approach from hybrid. We do a nightly git-merge from
amd-staging-4.9 and running it through our automated pre-submission
tests. If the merge doesn't have conflicts that process is completely
automated (except for someone pressing the Submit button in the end). If
the testing fails, we have to block integrations until the problem is
fixed upstream. So I'm fixing the problem upstream. ;)

Regards,
  Felix

>
> Regards,
> Christian.
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 ++++++++++++++++++------
>>   1 file changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 818747f..c11d15e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -275,13 +275,18 @@ static int amdgpu_vm_alloc_levels(struct
>> amdgpu_device *adev,
>>           memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
>>       }
>>   -    from = (saddr >> shift) % amdgpu_vm_num_entries(adev, level);
>> -    to = (eaddr >> shift) % amdgpu_vm_num_entries(adev, level);
>> +    from = saddr >> shift;
>> +    to = eaddr >> shift;
>> +    if (from >= amdgpu_vm_num_entries(adev, level) ||
>> +        to >= amdgpu_vm_num_entries(adev, level))
>> +        return -EINVAL;
>>         if (to > parent->last_entry_used)
>>           parent->last_entry_used = to;
>>         ++level;
>> +    saddr = saddr & ((1 << shift) - 1);
>> +    eaddr = eaddr & ((1 << shift) - 1);
>>         /* walk over the address space and allocate the page tables */
>>       for (pt_idx = from; pt_idx <= to; ++pt_idx) {
>> @@ -312,8 +317,11 @@ static int amdgpu_vm_alloc_levels(struct
>> amdgpu_device *adev,
>>           }
>>             if (level < adev->vm_manager.num_level) {
>> -            r = amdgpu_vm_alloc_levels(adev, vm, entry, saddr,
>> -                           eaddr, level);
>> +            uint64_t sub_saddr = (pt_idx == from) ? saddr : 0;
>> +            uint64_t sub_eaddr = (pt_idx == to) ? eaddr :
>> +                ((1 << shift) - 1);
>> +            r = amdgpu_vm_alloc_levels(adev, vm, entry, sub_saddr,
>> +                           sub_eaddr, level);
>>               if (r)
>>                   return r;
>>           }
>> @@ -990,8 +998,10 @@ static void amdgpu_vm_update_ptes(struct
>> amdgpu_pte_update_params *params,
>>       /* initialize the variables */
>>       addr = start;
>>       pt = amdgpu_vm_get_pt(params, addr);
>> -    if (!pt)
>> +    if (!pt) {
>> +        pr_err("PT not found, aborting update_ptes\n");
>>           return;
>> +    }
>>         if (params->shadow) {
>>           if (!pt->shadow)
>> @@ -1015,8 +1025,10 @@ static void amdgpu_vm_update_ptes(struct
>> amdgpu_pte_update_params *params,
>>       /* walk over the address space and update the page tables */
>>       while (addr < end) {
>>           pt = amdgpu_vm_get_pt(params, addr);
>> -        if (!pt)
>> +        if (!pt) {
>> +            pr_err("PT not found, aborting update_ptes\n");
>>               return;
>> +        }
>>             if (params->shadow) {
>>               if (!pt->shadow)
>
>



More information about the amd-gfx mailing list