[PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for large BOs

Wed Mar 29 14:32:16 UTC 2017

On 17-03-29 02:52 AM, Christian König wrote:
> Am 29.03.2017 um 07:58 schrieb Zhang, Jerry (Junwei):
>> Hi Felix,
>>
>> Thanks for your illustration with patience.
>> I got your meaning then, and thanks to fix that.
>>
>> > I think you're right, there are some extra high bits in saddr and
>> eaddr, but
>> > they get discarded by the modulo-division in the next recursion
>> level. For
>> > added clarity the modulo division (or masking of high bits) could
>> be moved up
>> > to the caller.
>>
>> It may be more clear if modulo-division called before the caller.
>
> Yeah, indeed that isn't easy to get on first glance.

OK, I'll fix that.

>
> Please clean that up and also remove all those WARN_ON(), we don't
> want to spam the system log with backtraces like this.

I disagree. When you see the backtraces from WARN_ONs in
amdgpu_vm_get_pt, it means the driver is trying to update a page table
that hasn't been allocated. Without those WARN_ONs there is absolutely
no warning of any kind - amdgpu_vm_update_pte just fails silently (it's
a void function).

Therefore I wouldn't call those warnings spam. Given that the VM PT
allocation code for multi-levels is non-trivial, I think that's a good
sanity check to have and really helpful to see in a log when user report
random VM faults.

Regards,
  Felix

>
> Regards,
> Christian.
>
>> Anyway,
>> Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com>
>>
>>
>> On 03/29/2017 11:24 AM, Kuehling, Felix wrote:
>>> Hi Jerry,
>>>
>>> Let me clarify my understanding of the problem and the intention of
>>> the fix.
>>>
>>> Address range per page: 4KB
>>> Address range per PT (level 3): 2MB
>>> Address range per PD (level 2): 1GB
>>> Address range per PD (level 1): 512GB
>>> Address range per PD (level 0): 256TB
>>>
>>> Imagine for example a mapping that spans multiple level 2 page
>>> directories. Say
>>> from 0.5GB to 2.5GB.
>>>
>>> At level 0:
>>> from = 0
>>> to = 0
>>>
>>> At level 1:
>>> from = 0
>>> to = 2
>>>
>>> So for level 2 amdgpu_vm_alloc_levels gets called 3 times (pt_idx =
>>> [0..2]).
>>> Without my change, it gets called with the same saddr and eaddr 3
>>> times. So it
>>> calculates the same "from" and "to" each time:
>>> from = 256 % 512 = 256
>>> to = 767 % 512 = 255
>>>
>>> Obviously that doesn't work. What we really want is this (from..to
>>> in level 2
>>> when called for pt_idx values from level 1):
>>>
>>> For pt_idx=0 we want to fill the second half of the level 2 page
>>> directory with
>>> page tables:
>>> from = 256
>>> to = 511
>>>
>>> For pt_idx=1 we need to fill the entire level 2 page directories
>>> with page tables:
>>> from = 0
>>> to = 511
>>>
>>> For pt_idx=2 we want to fill the first half of the level 2 page
>>> directory with
>>> page tables:
>>> from = 0
>>> to = 255
>>>
>>> This creates a contiguous range of page tables across the three
>>> level 2 page
>>> directories that are spanned by the allocation. My change achieves
>>> that.
>>>
>>> I think you're right, there are some extra high bits in saddr and
>>> eaddr, but
>>> they get discarded by the modulo-division in the next recursion
>>> level. For
>>> added clarity the modulo division (or masking of high bits) could be
>>> moved up
>>> to the caller.
>>>
>>> Regards,
>>>    Felix
>>>
>>>
>>> -- 
>>> F e l i x   K u e h l i n g
>>> SMTS Software Development Engineer | Vertical Workstation/Compute
>>> 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
>>> (O) +1(289)695-1597
>>>     _     _   _   _____   _____
>>>    / \   | \ / | |  _  \  \ _  |
>>>   / A \  | \M/ | | |D) )  /|_| |
>>> /_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com
>>> -------------------------------------------------------------------------------
>>>
>>> *From:* Zhang, Jerry
>>> *Sent:* Tuesday, March 28, 2017 10:54:03 PM
>>> *To:* Kuehling, Felix; amd-gfx at lists.freedesktop.org; Koenig,
>>> Christian; Zhou,
>>> David(ChunMing)
>>> *Cc:* Deucher, Alexander; Russell, Kent
>>> *Subject:* Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table
>>> bugs for
>>> large BOs
>>> On 03/29/2017 09:00 AM, Felix Kuehling wrote:
>>>> Fix the start/end address calculation for address ranges that span
>>>> multiple page directories in amdgpu_vm_alloc_levels.
>>>>
>>>> Add WARN_ONs if page tables aren't found. Otherwise the page table
>>>> update would just fail silently.
>>>>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++++++---
>>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 818747f..44aa039 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -278,6 +278,8 @@ static int amdgpu_vm_alloc_levels(struct
>>>> amdgpu_device *adev,
>>>>        from = (saddr >> shift) % amdgpu_vm_num_entries(adev, level);
>>>>        to = (eaddr >> shift) % amdgpu_vm_num_entries(adev, level);
>>>>
>>>> +     WARN_ON(from > to);
>>>> +
>>>>        if (to > parent->last_entry_used)
>>>>                parent->last_entry_used = to;
>>>>
>>>> @@ -312,8 +314,11 @@ static int amdgpu_vm_alloc_levels(struct
>>>> amdgpu_device *adev,
>>>>                }
>>>>
>>>>                if (level < adev->vm_manager.num_level) {
>>>> -                     r = amdgpu_vm_alloc_levels(adev, vm, entry,
>>>> saddr,
>>>> -                                                eaddr, level);
>>>> +                     uint64_t sub_saddr = (pt_idx == from) ? saddr
>>>> : 0;
>>>> +                     uint64_t sub_eaddr = (pt_idx == to) ? eaddr :
>>>> +                             ((1 << shift) - 1);
>>>> +                     r = amdgpu_vm_alloc_levels(adev, vm, entry,
>>>> sub_saddr,
>>>> +                                                sub_eaddr, level);
>>>
>>> Do you mean to create a full sub-pt if pt_idx != from and != to?
>>> I didn't catch it up clear.
>>>
>>> In my perspective, we may need to clear the saddr and eaddr upper
>>> level bit
>>> before going on to allocate the next level PT.
>>> Otherwise, the number of next PT entry will be incorrect, more than
>>> num_entries
>>> as the upper level PT number calculated in.
>>>
>>> e.g.
>>> there is a address contains
>>> PD + PT1 + PT2 + PT3
>>>
>>> for the first round, we could get correct "from" and "to" address
>>> with right
>>> "shift"(3 blocks), then walk over the address space in PD range.
>>>
>>> Then for the 2nd round, we cannot get the expected "from" and "to"
>>> address with
>>> "shift"(2 blocks), as it walks over the address space in PD + PT1
>>> range.
>>> While we'd like it's in range PT1 indeed.
>>>
>>> The fault way goes on with later round.
>>>
>>> Perhaps, that's the root cause about the issue you face.
>>>
>>> Jerry.
>>>
>>>>                        if (r)
>>>>                                return r;
>>>>                }
>>>> @@ -957,9 +962,11 @@ static struct amdgpu_bo
>>>> *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
>>>>                entry = &entry->entries[idx];
>>>>        }
>>>>
>>>> -     if (level)
>>>> +     if (WARN_ON(level))
>>>>                return NULL;
>>>>
>>>> +     WARN_ON(!entry->bo);
>>>> +
>>>>        return entry->bo;
>>>>   }
>>>>
>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>