[PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3
Koenig, Christian
Christian.Koenig at amd.com
Mon May 27 10:51:06 UTC 2019
Am 24.05.19 um 23:34 schrieb Kuehling, Felix:
> On 2019-05-23 5:06 a.m., Christian König wrote:
>> [CAUTION: External Email]
>>
>> Leaving BOs on the LRU is harmless. We always did this for VM page table
>> and per VM BOs.
>>
>> The key point is that BOs which couldn't be reserved can't be evicted.
>> So what happened is that an application used basically all of VRAM
>> during CS and because of this X server couldn't pin a BO for scanout.
>>
>> Now we keep the BOs on the LRU and modify TTM to block for the CS to
>> complete, which in turn allows the X server to pin its BO for scanout.
>
> OK, let me rephrase that to make sure I understand it correctly. I think
> the point is that eviction candidates come from an LRU list, so leaving
> things on the LRU makes more BOs available for eviction and avoids OOM
> situations. To take advantage of that, patch 6 adds the ability to wait
> for reserved BOs when there is nothing easier to evict.
>
> ROCm applications like to use lots of memory. So it probably makes sense
> for us to stop removing our BOs from the LRU as well while we
> mass-validate our BOs in amdgpu_amdkfd_gpuvm_restore_process_bos.
Well that would allow concurrent calls of
amdgpu_amdkfd_gpuvm_restore_process_bos() to wait for each other.
If that's what you want then yeah that certainly makes sense.
Regards,
Christian.
>
> Regards,
> Felix
>
>
>> Christian.
>>
>> Am 22.05.19 um 21:43 schrieb Kuehling, Felix:
>>> Can you explain how this avoids OOM situations? When is it safe to leave
>>> a reserved BO on the LRU list? Could we do the same thing in
>>> amdgpu_amdkfd_gpuvm.c? And if we did, what would be the expected side
>>> effects or consequences?
>>>
>>> Thanks,
>>> Felix
>>>
>>> On 2019-05-22 8:59 a.m., Christian König wrote:
>>>> [CAUTION: External Email]
>>>>
>>>> This avoids OOM situations when we have lots of threads
>>>> submitting at the same time.
>>>>
>>>> v3: apply this to the whole driver, not just CS
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 2 +-
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 ++--
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
>>>> 4 files changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 20f2955d2a55..3e2da24cd17a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct
>>>> amdgpu_cs_parser *p,
>>>> }
>>>>
>>>> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
>>>> - &duplicates, true);
>>>> + &duplicates, false);
>>>> if (unlikely(r != 0)) {
>>>> if (r != -ERESTARTSYS)
>>>> DRM_ERROR("ttm_eu_reserve_buffers
>>>> failed.\n");
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> index 06f83cac0d3a..f660628e6af9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>>>> @@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device
>>>> *adev, struct amdgpu_vm *vm,
>>>> list_add(&csa_tv.head, &list);
>>>> amdgpu_vm_get_pd_bo(vm, &list, &pd);
>>>>
>>>> - r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, true);
>>>> + r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL, false);
>>>> if (r) {
>>>> DRM_ERROR("failed to reserve CSA,PD BOs:
>>>> err=%d\n", r);
>>>> return r;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> index d513a5ad03dd..ed25a4e14404 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> @@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct
>>>> drm_gem_object *obj,
>>>>
>>>> amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
>>>>
>>>> - r = ttm_eu_reserve_buffers(&ticket, &list, false,
>>>> &duplicates, true);
>>>> + r = ttm_eu_reserve_buffers(&ticket, &list, false,
>>>> &duplicates, false);
>>>> if (r) {
>>>> dev_err(adev->dev, "leaking bo va because "
>>>> "we fail to reserve bo (%d)\n", r);
>>>> @@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>
>>>> amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd);
>>>>
>>>> - r = ttm_eu_reserve_buffers(&ticket, &list, true,
>>>> &duplicates, true);
>>>> + r = ttm_eu_reserve_buffers(&ticket, &list, true,
>>>> &duplicates, false);
>>>> if (r)
>>>> goto error_unref;
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> index c430e8259038..d60593cc436e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>>> @@ -155,7 +155,7 @@ static inline int amdgpu_bo_reserve(struct
>>>> amdgpu_bo *bo, bool no_intr)
>>>> struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>>> int r;
>>>>
>>>> - r = ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
>>>> + r = __ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
>>>> if (unlikely(r != 0)) {
>>>> if (r != -ERESTARTSYS)
>>>> dev_err(adev->dev, "%p reserve failed\n",
>>>> bo);
>>>> --
>>>> 2.17.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the dri-devel
mailing list