[PATCH] drm/amdgpu: remove PT BOs when unmapping

Wed Oct 30 17:57:06 UTC 2019

One thing I've forgotten:

What you could maybe do to improve the situation is to join adjacent 
ranges in amdgpu_vm_clear_freed(), but I'm not sure how the chances are 
that the ranges are freed all together.

The only other alternative I can see would be to check the mappings of a 
range in amdgpu_update_ptes() and see if you could walk the tree up if 
the valid flag is not set and there are no mappings left for a page table.

Regards,
Christian.

Am 30.10.19 um 18:42 schrieb Koenig, Christian:
>> The vaild flag doesn't take effect in this function.
> That's irrelevant.
>
> See what amdgpu_vm_update_ptes() does is to first determine the 
> fragment size:
>> amdgpu_vm_fragment(params, frag_start, end, flags, &frag, &frag_end);
>
> Then we walk down the tree:
>>         amdgpu_vm_pt_start(adev, params->vm, start, &cursor);
>>         while (cursor.pfn < end) {
>
> And make sure that the page tables covering the address range are 
> actually allocated:
>>                 r = amdgpu_vm_alloc_pts(params->adev, params->vm, 
>> &cursor);
>
> Then we update the tables with the flags and addresses and free up 
> subsequent tables in the case of huge pages or freed up areas:
>>                         /* Free all child entries */
>>                         while (cursor.pfn < frag_start) {
>>                                 amdgpu_vm_free_pts(adev, params->vm, 
>> &cursor);
>>                                 amdgpu_vm_pt_next(adev, &cursor);
>>                         }
>
> This is the maximum you can free, cause all other page tables are not 
> completely covered by the range and so potentially still in use.
>
> And I have the strong suspicion that this is what your patch is 
> actually doing wrong. In other words you are also freeing page tables 
> which are only partially covered by the range and so potentially still 
> in use.
>
> Since we don't have any tracking how many entries in a page table are 
> currently valid and how many are invalid we actually can't implement 
> what you are trying to do here. So the patch is definitely somehow broken.
>
> Regards,
> Christian.
>
> Am 30.10.19 um 17:55 schrieb Huang, JinHuiEric:
>>
>> The vaild flag doesn't take effect in this function. 
>> amdgpu_vm_alloc_pts() is always executed that only depended on 
>> "cursor.pfn < end". The valid flag has only been checked on here for 
>> asic below GMC v9:
>>
>> if (adev->asic_type < CHIP_VEGA10 &&
>>             (flags & AMDGPU_PTE_VALID))...
>>
>> Regards,
>>
>> Eric
>>
>> On 2019-10-30 12:30 p.m., Koenig, Christian wrote:
>>>
>>>
>>> Am 30.10.2019 17:19 schrieb "Huang, JinHuiEric" 
>>> <JinHuiEric.Huang at amd.com>:
>>>
>>>     I tested it that it saves a lot of vram on KFD big buffer stress
>>>     test. I think there are two reasons:
>>>
>>>     1. Calling amdgpu_vm_update_ptes() during unmapping will
>>>     allocate unnecessary pts, because there is no flag to determine
>>>     if the VA is mapping or unmapping in function
>>>     amdgpu_vm_update_ptes(). It saves the most of memory.
>>>
>>> That's not correct. The valid flag is used for this.
>>>
>>>     2. Intentionally removing those unmapping pts is logical
>>>     expectation, although it is not removing so much pts.
>>>
>>> Well I actually don't see a change to what update_ptes is doing and 
>>> have the strong suspicion that the patch is simply broken.
>>>
>>> You either free page tables which are potentially still in use or 
>>> update_pte doesn't free page tables when the valid but is not set.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>
>>>     Regards,
>>>
>>>     Eric
>>>
>>>     On 2019-10-30 11:57 a.m., Koenig, Christian wrote:
>>>
>>>
>>>
>>>         Am 30.10.2019 16:47 schrieb "Kuehling, Felix"
>>>         <Felix.Kuehling at amd.com> <mailto:Felix.Kuehling at amd.com>:
>>>
>>>             On 2019-10-30 9:52 a.m., Christian König wrote:
>>>             > Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>>>             >> The issue is PT BOs are not freed when unmapping VA,
>>>             >> which causes vram usage accumulated is huge in some
>>>             >> memory stress test, such as kfd big buffer stress test.
>>>             >> Function amdgpu_vm_bo_update_mapping() is called by both
>>>             >> amdgpu_vm_bo_update() and amdgpu_vm_clear_freed(). The
>>>             >> solution is replacing amdgpu_vm_bo_update_mapping() in
>>>             >> amdgpu_vm_clear_freed() with removing PT BOs function
>>>             >> to save vram usage.
>>>             >
>>>             > NAK, that is intentional behavior.
>>>             >
>>>             > Otherwise we can run into out of memory situations
>>>             when page tables
>>>             > need to be allocated again under stress.
>>>
>>>             That's a bit arbitrary and inconsistent. We are freeing
>>>             page tables in
>>>             other situations, when a mapping uses huge pages in
>>>             amdgpu_vm_update_ptes. Why not when a mapping is
>>>             destroyed completely?
>>>
>>>             I'm actually a bit surprised that the huge-page handling in
>>>             amdgpu_vm_update_ptes isn't kicking in to free up
>>>             lower-level page
>>>             tables when a BO is unmapped.
>>>
>>>
>>>         Well it does free the lower level, and that is already
>>>         causing problems (that's why I added the reserved space).
>>>
>>>         What we don't do is freeing the higher levels.
>>>
>>>         E.g. when you free a 2MB BO we free the lowest level, if we
>>>         free a 1GB BO we free the two lowest levels etc...
>>>
>>>         The problem with freeing the higher levels is that you don't
>>>         know who is also using this. E.g. we would need to check all
>>>         entries when we unmap one.
>>>
>>>         It's simply not worth it for a maximum saving of 2MB per VM.
>>>
>>>         Writing this I'm actually wondering how you ended up in this
>>>         issue? There shouldn't be much savings from this.
>>>
>>>         Regards,
>>>         Christian.
>>>
>>>
>>>             Regards,
>>>                Felix
>>>
>>>
>>>             >
>>>             > Regards,
>>>             > Christian.
>>>             >
>>>             >>
>>>             >> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924
>>>             >> Signed-off-by: Eric Huang <JinhuiEric.Huang at amd.com>
>>>             <mailto:JinhuiEric.Huang at amd.com>
>>>             >> ---
>>>             >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56
>>>             >> +++++++++++++++++++++++++++++-----
>>>             >>   1 file changed, 48 insertions(+), 8 deletions(-)
>>>             >>
>>>             >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>             >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>             >> index 0f4c3b2..8a480c7 100644
>>>             >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>             >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>             >> @@ -1930,6 +1930,51 @@ static void
>>>             amdgpu_vm_prt_fini(struct
>>>             >> amdgpu_device *adev, struct amdgpu_vm *vm)
>>>             >>   }
>>>             >>     /**
>>>             >> + * amdgpu_vm_remove_ptes - free PT BOs
>>>             >> + *
>>>             >> + * @adev: amdgpu device structure
>>>             >> + * @vm: amdgpu vm structure
>>>             >> + * @start: start of mapped range
>>>             >> + * @end: end of mapped entry
>>>             >> + *
>>>             >> + * Free the page table level.
>>>             >> + */
>>>             >> +static int amdgpu_vm_remove_ptes(struct
>>>             amdgpu_device *adev,
>>>             >> +        struct amdgpu_vm *vm, uint64_t start,
>>>             uint64_t end)
>>>             >> +{
>>>             >> +    struct amdgpu_vm_pt_cursor cursor;
>>>             >> +    unsigned shift, num_entries;
>>>             >> +
>>>             >> + amdgpu_vm_pt_start(adev, vm, start, &cursor);
>>>             >> +    while (cursor.level < AMDGPU_VM_PTB) {
>>>             >> +        if (!amdgpu_vm_pt_descendant(adev, &cursor))
>>>             >> +            return -ENOENT;
>>>             >> +    }
>>>             >> +
>>>             >> +    while (cursor.pfn < end) {
>>>             >> + amdgpu_vm_free_table(cursor.entry);
>>>             >> +        num_entries = amdgpu_vm_num_entries(adev,
>>>             cursor.level - 1);
>>>             >> +
>>>             >> +        if (cursor.entry !=
>>>             &cursor.parent->entries[num_entries - 1]) {
>>>             >> +            /* Next ptb entry */
>>>             >> +            shift = amdgpu_vm_level_shift(adev,
>>>             cursor.level - 1);
>>>             >> + cursor.pfn += 1ULL << shift;
>>>             >> + cursor.pfn &= ~((1ULL << shift) - 1);
>>>             >> + cursor.entry++;
>>>             >> +        } else {
>>>             >> +            /* Next ptb entry in next pd0 entry */
>>>             >> + amdgpu_vm_pt_ancestor(&cursor);
>>>             >> +            shift = amdgpu_vm_level_shift(adev,
>>>             cursor.level - 1);
>>>             >> + cursor.pfn += 1ULL << shift;
>>>             >> + cursor.pfn &= ~((1ULL << shift) - 1);
>>>             >> + amdgpu_vm_pt_descendant(adev, &cursor);
>>>             >> +        }
>>>             >> +    }
>>>             >> +
>>>             >> +    return 0;
>>>             >> +}
>>>             >> +
>>>             >> +/**
>>>             >>    * amdgpu_vm_clear_freed - clear freed BOs in the PT
>>>             >>    *
>>>             >>    * @adev: amdgpu_device pointer
>>>             >> @@ -1949,7 +1994,6 @@ int
>>>             amdgpu_vm_clear_freed(struct amdgpu_device
>>>             >> *adev,
>>>             >> struct dma_fence **fence)
>>>             >>   {
>>>             >>       struct amdgpu_bo_va_mapping *mapping;
>>>             >> -    uint64_t init_pte_value = 0;
>>>             >>       struct dma_fence *f = NULL;
>>>             >>       int r;
>>>             >>   @@ -1958,13 +2002,10 @@ int
>>>             amdgpu_vm_clear_freed(struct
>>>             >> amdgpu_device *adev,
>>>             >>               struct amdgpu_bo_va_mapping, list);
>>>             >> list_del(&mapping->list);
>>>             >>   -        if (vm->pte_support_ats &&
>>>             >> - mapping->start < AMDGPU_GMC_HOLE_START)
>>>             >> - init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>>>             >> +        r = amdgpu_vm_remove_ptes(adev, vm,
>>>             >> + (mapping->start + 0x1ff) & (~0x1ffll),
>>>             >> + (mapping->last + 1) & (~0x1ffll));
>>>             >>   -        r = amdgpu_vm_bo_update_mapping(adev, vm,
>>>             false, NULL,
>>>             >> - mapping->start, mapping->last,
>>>             >> - init_pte_value, 0, NULL, &f);
>>>             >> amdgpu_vm_free_mapping(adev, vm, mapping, f);
>>>             >>           if (r) {
>>>             >> dma_fence_put(f);
>>>             >> @@ -1980,7 +2021,6 @@ int
>>>             amdgpu_vm_clear_freed(struct amdgpu_device
>>>             >> *adev,
>>>             >>       }
>>>             >>         return 0;
>>>             >> -
>>>             >>   }
>>>             >>     /**
>>>             >
>>>             > _______________________________________________
>>>             > amd-gfx mailing list
>>>             > amd-gfx at lists.freedesktop.org
>>>             <mailto:amd-gfx at lists.freedesktop.org>
>>>             > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191030/fd74f1a9/attachment-0001.html>