[PATCH] drm/amdgpu: remove PT BOs when unmapping

Thu Oct 31 14:08:52 UTC 2019

I could try it on my carrizo/polaris setup.  Is there a test procedure I 
could folllow to trigger the changed code paths?

Tom

On 2019-10-31 6:41 a.m., Koenig, Christian wrote:
> Just tested this and amdgpu_vm_update_ptes() indeed works as expected.
>
> When you free at least a 2MB the lowest level of page tables is freed 
> up again.
>
> BTW: What hardware have you tested this on? On gfx8 and older it is 
> expected that page tables are never freed.
>
> Regards,
> Christian.
>
> Am 30.10.19 um 19:11 schrieb Christian König:
>> Then I don't see how this patch actually changes anything.
>>
>> Could only be a bug in amdgpu_vm_update_ptes(). Going to investigate 
>> this, but I won't have time to look into the ticket in detail.
>>
>> Regards,
>> Christian.
>>
>> Am 30.10.19 um 19:00 schrieb Huang, JinHuiEric:
>>>
>>> Actually I do prevent to remove in-use pts by this:
>>>
>>> +               r = amdgpu_vm_remove_ptes(adev, vm,
>>> +                               (mapping->start + 0x1ff) & (~0x1ffll),
>>> +                               (mapping->last + 1) & (~0x1ffll));
>>>
>>> Which is only removing aligned page table for 2M. And I have tested 
>>> it at least on KFD tests without anything broken.
>>>
>>> By the way, I am not familiar with memory staff. This patch is the 
>>> best I can do for now. Could you take a look at the Jira ticket 
>>> SWDEV-201443 ? and find the better solution. Thanks!
>>>
>>> Regards,
>>>
>>> Eric
>>>
>>> On 2019-10-30 1:57 p.m., Christian König wrote:
>>>> One thing I've forgotten:
>>>>
>>>> What you could maybe do to improve the situation is to join 
>>>> adjacent ranges in amdgpu_vm_clear_freed(), but I'm not sure how 
>>>> the chances are that the ranges are freed all together.
>>>>
>>>> The only other alternative I can see would be to check the mappings 
>>>> of a range in amdgpu_update_ptes() and see if you could walk the 
>>>> tree up if the valid flag is not set and there are no mappings left 
>>>> for a page table.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 30.10.19 um 18:42 schrieb Koenig, Christian:
>>>>>> The vaild flag doesn't take effect in this function.
>>>>> That's irrelevant.
>>>>>
>>>>> See what amdgpu_vm_update_ptes() does is to first determine the 
>>>>> fragment size:
>>>>>> amdgpu_vm_fragment(params, frag_start, end, flags, &frag, &frag_end);
>>>>>
>>>>> Then we walk down the tree:
>>>>>>         amdgpu_vm_pt_start(adev, params->vm, start, &cursor);
>>>>>>         while (cursor.pfn < end) {
>>>>>
>>>>> And make sure that the page tables covering the address range are 
>>>>> actually allocated:
>>>>>>                 r = amdgpu_vm_alloc_pts(params->adev, params->vm, 
>>>>>> &cursor);
>>>>>
>>>>> Then we update the tables with the flags and addresses and free up 
>>>>> subsequent tables in the case of huge pages or freed up areas:
>>>>>>                         /* Free all child entries */
>>>>>>                         while (cursor.pfn < frag_start) {
>>>>>> amdgpu_vm_free_pts(adev, params->vm, &cursor);
>>>>>> amdgpu_vm_pt_next(adev, &cursor);
>>>>>>                         }
>>>>>
>>>>> This is the maximum you can free, cause all other page tables are 
>>>>> not completely covered by the range and so potentially still in use.
>>>>>
>>>>> And I have the strong suspicion that this is what your patch is 
>>>>> actually doing wrong. In other words you are also freeing page 
>>>>> tables which are only partially covered by the range and so 
>>>>> potentially still in use.
>>>>>
>>>>> Since we don't have any tracking how many entries in a page table 
>>>>> are currently valid and how many are invalid we actually can't 
>>>>> implement what you are trying to do here. So the patch is 
>>>>> definitely somehow broken.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 30.10.19 um 17:55 schrieb Huang, JinHuiEric:
>>>>>>
>>>>>> The vaild flag doesn't take effect in this function. 
>>>>>> amdgpu_vm_alloc_pts() is always executed that only depended on 
>>>>>> "cursor.pfn < end". The valid flag has only been checked on here 
>>>>>> for asic below GMC v9:
>>>>>>
>>>>>> if (adev->asic_type < CHIP_VEGA10 &&
>>>>>>             (flags & AMDGPU_PTE_VALID))...
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>> On 2019-10-30 12:30 p.m., Koenig, Christian wrote:
>>>>>>>
>>>>>>>
>>>>>>> Am 30.10.2019 17:19 schrieb "Huang, JinHuiEric" 
>>>>>>> <JinHuiEric.Huang at amd.com>:
>>>>>>>
>>>>>>>     I tested it that it saves a lot of vram on KFD big buffer
>>>>>>>     stress test. I think there are two reasons:
>>>>>>>
>>>>>>>     1. Calling amdgpu_vm_update_ptes() during unmapping will
>>>>>>>     allocate unnecessary pts, because there is no flag to
>>>>>>>     determine if the VA is mapping or unmapping in function
>>>>>>>     amdgpu_vm_update_ptes(). It saves the most of memory.
>>>>>>>
>>>>>>> That's not correct. The valid flag is used for this.
>>>>>>>
>>>>>>>     2. Intentionally removing those unmapping pts is logical
>>>>>>>     expectation, although it is not removing so much pts.
>>>>>>>
>>>>>>> Well I actually don't see a change to what update_ptes is doing 
>>>>>>> and have the strong suspicion that the patch is simply broken.
>>>>>>>
>>>>>>> You either free page tables which are potentially still in use 
>>>>>>> or update_pte doesn't free page tables when the valid but is not 
>>>>>>> set.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     Regards,
>>>>>>>
>>>>>>>     Eric
>>>>>>>
>>>>>>>     On 2019-10-30 11:57 a.m., Koenig, Christian wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         Am 30.10.2019 16:47 schrieb "Kuehling, Felix"
>>>>>>>         <Felix.Kuehling at amd.com> <mailto:Felix.Kuehling at amd.com>:
>>>>>>>
>>>>>>>             On 2019-10-30 9:52 a.m., Christian König wrote:
>>>>>>>             > Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>>>>>>>             >> The issue is PT BOs are not freed when unmapping VA,
>>>>>>>             >> which causes vram usage accumulated is huge in some
>>>>>>>             >> memory stress test, such as kfd big buffer stress
>>>>>>>             test.
>>>>>>>             >> Function amdgpu_vm_bo_update_mapping() is called
>>>>>>>             by both
>>>>>>>             >> amdgpu_vm_bo_update() and
>>>>>>>             amdgpu_vm_clear_freed(). The
>>>>>>>             >> solution is replacing
>>>>>>>             amdgpu_vm_bo_update_mapping() in
>>>>>>>             >> amdgpu_vm_clear_freed() with removing PT BOs function
>>>>>>>             >> to save vram usage.
>>>>>>>             >
>>>>>>>             > NAK, that is intentional behavior.
>>>>>>>             >
>>>>>>>             > Otherwise we can run into out of memory situations
>>>>>>>             when page tables
>>>>>>>             > need to be allocated again under stress.
>>>>>>>
>>>>>>>             That's a bit arbitrary and inconsistent. We are
>>>>>>>             freeing page tables in
>>>>>>>             other situations, when a mapping uses huge pages in
>>>>>>>             amdgpu_vm_update_ptes. Why not when a mapping is
>>>>>>>             destroyed completely?
>>>>>>>
>>>>>>>             I'm actually a bit surprised that the huge-page
>>>>>>>             handling in
>>>>>>>             amdgpu_vm_update_ptes isn't kicking in to free up
>>>>>>>             lower-level page
>>>>>>>             tables when a BO is unmapped.
>>>>>>>
>>>>>>>
>>>>>>>         Well it does free the lower level, and that is already
>>>>>>>         causing problems (that's why I added the reserved space).
>>>>>>>
>>>>>>>         What we don't do is freeing the higher levels.
>>>>>>>
>>>>>>>         E.g. when you free a 2MB BO we free the lowest level, if
>>>>>>>         we free a 1GB BO we free the two lowest levels etc...
>>>>>>>
>>>>>>>         The problem with freeing the higher levels is that you
>>>>>>>         don't know who is also using this. E.g. we would need to
>>>>>>>         check all entries when we unmap one.
>>>>>>>
>>>>>>>         It's simply not worth it for a maximum saving of 2MB per VM.
>>>>>>>
>>>>>>>         Writing this I'm actually wondering how you ended up in
>>>>>>>         this issue? There shouldn't be much savings from this.
>>>>>>>
>>>>>>>         Regards,
>>>>>>>         Christian.
>>>>>>>
>>>>>>>
>>>>>>>             Regards,
>>>>>>>                Felix
>>>>>>>
>>>>>>>
>>>>>>>             >
>>>>>>>             > Regards,
>>>>>>>             > Christian.
>>>>>>>             >
>>>>>>>             >>
>>>>>>>             >> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924
>>>>>>>             >> Signed-off-by: Eric Huang
>>>>>>>             <JinhuiEric.Huang at amd.com>
>>>>>>>             <mailto:JinhuiEric.Huang at amd.com>
>>>>>>>             >> ---
>>>>>>>             >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56
>>>>>>>             >> +++++++++++++++++++++++++++++-----
>>>>>>>             >>   1 file changed, 48 insertions(+), 8 deletions(-)
>>>>>>>             >>
>>>>>>>             >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>             >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>             >> index 0f4c3b2..8a480c7 100644
>>>>>>>             >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>             >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>             >> @@ -1930,6 +1930,51 @@ static void
>>>>>>>             amdgpu_vm_prt_fini(struct
>>>>>>>             >> amdgpu_device *adev, struct amdgpu_vm *vm)
>>>>>>>             >>   }
>>>>>>>             >>     /**
>>>>>>>             >> + * amdgpu_vm_remove_ptes - free PT BOs
>>>>>>>             >> + *
>>>>>>>             >> + * @adev: amdgpu device structure
>>>>>>>             >> + * @vm: amdgpu vm structure
>>>>>>>             >> + * @start: start of mapped range
>>>>>>>             >> + * @end: end of mapped entry
>>>>>>>             >> + *
>>>>>>>             >> + * Free the page table level.
>>>>>>>             >> + */
>>>>>>>             >> +static int amdgpu_vm_remove_ptes(struct
>>>>>>>             amdgpu_device *adev,
>>>>>>>             >> + struct amdgpu_vm *vm, uint64_t start, uint64_t end)
>>>>>>>             >> +{
>>>>>>>             >> +    struct amdgpu_vm_pt_cursor cursor;
>>>>>>>             >> +    unsigned shift, num_entries;
>>>>>>>             >> +
>>>>>>>             >> + amdgpu_vm_pt_start(adev, vm, start, &cursor);
>>>>>>>             >> +    while (cursor.level < AMDGPU_VM_PTB) {
>>>>>>>             >> +        if (!amdgpu_vm_pt_descendant(adev, &cursor))
>>>>>>>             >> + return -ENOENT;
>>>>>>>             >> +    }
>>>>>>>             >> +
>>>>>>>             >> +    while (cursor.pfn < end) {
>>>>>>>             >> + amdgpu_vm_free_table(cursor.entry);
>>>>>>>             >> + num_entries = amdgpu_vm_num_entries(adev,
>>>>>>>             cursor.level - 1);
>>>>>>>             >> +
>>>>>>>             >> +        if (cursor.entry !=
>>>>>>>             &cursor.parent->entries[num_entries - 1]) {
>>>>>>>             >> + /* Next ptb entry */
>>>>>>>             >> + shift = amdgpu_vm_level_shift(adev,
>>>>>>>             cursor.level - 1);
>>>>>>>             >> + cursor.pfn += 1ULL << shift;
>>>>>>>             >> + cursor.pfn &= ~((1ULL << shift) - 1);
>>>>>>>             >> + cursor.entry++;
>>>>>>>             >> +        } else {
>>>>>>>             >> + /* Next ptb entry in next pd0 entry */
>>>>>>>             >> + amdgpu_vm_pt_ancestor(&cursor);
>>>>>>>             >> + shift = amdgpu_vm_level_shift(adev,
>>>>>>>             cursor.level - 1);
>>>>>>>             >> + cursor.pfn += 1ULL << shift;
>>>>>>>             >> + cursor.pfn &= ~((1ULL << shift) - 1);
>>>>>>>             >> + amdgpu_vm_pt_descendant(adev, &cursor);
>>>>>>>             >> +        }
>>>>>>>             >> +    }
>>>>>>>             >> +
>>>>>>>             >> +    return 0;
>>>>>>>             >> +}
>>>>>>>             >> +
>>>>>>>             >> +/**
>>>>>>>             >>    * amdgpu_vm_clear_freed - clear freed BOs in
>>>>>>>             the PT
>>>>>>>             >>    *
>>>>>>>             >>    * @adev: amdgpu_device pointer
>>>>>>>             >> @@ -1949,7 +1994,6 @@ int
>>>>>>>             amdgpu_vm_clear_freed(struct amdgpu_device
>>>>>>>             >> *adev,
>>>>>>>             >>                 struct dma_fence **fence)
>>>>>>>             >>   {
>>>>>>>             >>       struct amdgpu_bo_va_mapping *mapping;
>>>>>>>             >> -    uint64_t init_pte_value = 0;
>>>>>>>             >>       struct dma_fence *f = NULL;
>>>>>>>             >>       int r;
>>>>>>>             >>   @@ -1958,13 +2002,10 @@ int
>>>>>>>             amdgpu_vm_clear_freed(struct
>>>>>>>             >> amdgpu_device *adev,
>>>>>>>             >> struct amdgpu_bo_va_mapping, list);
>>>>>>>             >> list_del(&mapping->list);
>>>>>>>             >>   -        if (vm->pte_support_ats &&
>>>>>>>             >> - mapping->start < AMDGPU_GMC_HOLE_START)
>>>>>>>             >> - init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>>>>>>>             >> +        r = amdgpu_vm_remove_ptes(adev, vm,
>>>>>>>             >> + (mapping->start + 0x1ff) & (~0x1ffll),
>>>>>>>             >> + (mapping->last + 1) & (~0x1ffll));
>>>>>>>             >>   -        r = amdgpu_vm_bo_update_mapping(adev,
>>>>>>>             vm, false, NULL,
>>>>>>>             >> - mapping->start, mapping->last,
>>>>>>>             >> - init_pte_value, 0, NULL, &f);
>>>>>>>             >> amdgpu_vm_free_mapping(adev, vm, mapping, f);
>>>>>>>             >>           if (r) {
>>>>>>>             >> dma_fence_put(f);
>>>>>>>             >> @@ -1980,7 +2021,6 @@ int
>>>>>>>             amdgpu_vm_clear_freed(struct amdgpu_device
>>>>>>>             >> *adev,
>>>>>>>             >>       }
>>>>>>>             >> return 0;
>>>>>>>             >> -
>>>>>>>             >>   }
>>>>>>>             >>     /**
>>>>>>>             >
>>>>>>>             > _______________________________________________
>>>>>>>             > amd-gfx mailing list
>>>>>>>             > amd-gfx at lists.freedesktop.org
>>>>>>>             <mailto:amd-gfx at lists.freedesktop.org>
>>>>>>>             > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx