[PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

Koenig, Christian Christian.Koenig at amd.com
Mon May 27 10:51:06 UTC 2019


Am 24.05.19 um 23:34 schrieb Kuehling, Felix:
> On 2019-05-23 5:06 a.m., Christian König wrote:
>> [CAUTION: External Email]
>>
>> Leaving BOs on the LRU is harmless. We always did this for VM page table
>> and per VM BOs.
>>
>> The key point is that BOs which couldn't be reserved can't be evicted.
>> So what happened is that an application used basically all of VRAM
>> during CS and because of this X server couldn't pin a BO for scanout.
>>
>> Now we keep the BOs on the LRU and modify TTM to block for the CS to
>> complete, which in turn allows the X server to pin its BO for scanout.
>
> OK, let me rephrase that to make sure I understand it correctly. I think
> the point is that eviction candidates come from an LRU list, so leaving
> things on the LRU makes more BOs available for eviction and avoids OOM
> situations. To take advantage of that, patch 6 adds the ability to wait
> for reserved BOs when there is nothing easier to evict.
>
> ROCm applications like to use lots of memory. So it probably makes sense
> for us to stop removing our BOs from the LRU as well while we
> mass-validate our BOs in amdgpu_amdkfd_gpuvm_restore_process_bos.

Well that would allow concurrent calls of 
amdgpu_amdkfd_gpuvm_restore_process_bos() to wait for each other.

If that's what you want then yeah that certainly makes sense.

Regards,
Christian.

>
> Regards,
>     Felix
>
>
>> Christian.
>>
>> Am 22.05.19 um 21:43 schrieb Kuehling, Felix:
>>> Can you explain how this avoids OOM situations? When is it safe to leave
>>> a reserved BO on the LRU list? Could we do the same thing in
>>> amdgpu_amdkfd_gpuvm.c? And if we did, what would be the expected side
>>> effects or consequences?
>>>
>>> Thanks,
>>>      Felix
>>>
>>> On 2019-05-22 8:59 a.m., Christian König wrote:
>>>> [CAUTION: External Email]
>>>>
>>>> This avoids OOM situations when we have lots of threads
>>>> submitting at the same time.
>>>>
>>>> v3: apply this to the whole driver, not just CS
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 2 +-
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c    | 2 +-
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ++--
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
>>>>     4 files changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 20f2955d2a55..3e2da24cd17a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct
>>>> amdgpu_cs_parser *p,
>>>>            }
>>>>
>>>>            r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>>>> -                                  &duplicates, true);
>>>> +                                  &duplicates, false);
>>>>            if (unlikely(r != 0)) {
>>>>                    if (r != -ERESTARTSYS)
>>>>                            DRM_ERROR("ttm_eu_reserve_buffers
>>>> failed.\n");
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> index 06f83cac0d3a..f660628e6af9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> @@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device
>>>> *adev, struct amdgpu_vm *vm,
>>>>            list_add(&csa_tv.head, &list);
>>>>            amdgpu_vm_get_pd_bo(vm, &list, &pd);
>>>>
>>>> -       r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, true);
>>>> +       r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, false);
>>>>            if (r) {
>>>>                    DRM_ERROR("failed to reserve CSA,PD BOs:
>>>> err=%d\n", r);
>>>>                    return r;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> index d513a5ad03dd..ed25a4e14404 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> @@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct
>>>> drm_gem_object *obj,
>>>>
>>>>            amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
>>>>
>>>> -       r = ttm_eu_reserve_buffers(&ticket, &list, false,
>>>> &duplicates, true);
>>>> +       r = ttm_eu_reserve_buffers(&ticket, &list, false,
>>>> &duplicates, false);
>>>>            if (r) {
>>>>                    dev_err(adev->dev, "leaking bo va because "
>>>>                            "we fail to reserve bo (%d)\n", r);
>>>> @@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>
>>>>            amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd);
>>>>
>>>> -       r = ttm_eu_reserve_buffers(&ticket, &list, true,
>>>> &duplicates, true);
>>>> +       r = ttm_eu_reserve_buffers(&ticket, &list, true,
>>>> &duplicates, false);
>>>>            if (r)
>>>>                    goto error_unref;
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> index c430e8259038..d60593cc436e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> @@ -155,7 +155,7 @@ static inline int amdgpu_bo_reserve(struct
>>>> amdgpu_bo *bo, bool no_intr)
>>>>            struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>>>            int r;
>>>>
>>>> -       r = ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
>>>> +       r = __ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
>>>>            if (unlikely(r != 0)) {
>>>>                    if (r != -ERESTARTSYS)
>>>>                            dev_err(adev->dev, "%p reserve failed\n",
>>>> bo);
>>>> -- 
>>>> 2.17.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list