[PATCH 0/5] prevent OOM triggered by TTM

Christian König christian.koenig at amd.com
Tue Feb 6 09:51:59 UTC 2018


Nice work, but a few comments.

First of all you need to reorder the patches. Adding the exceptions to 
the restrictions should come first, then the restriction itself. 
Otherwise we might break a setup in between the patches and that is bad 
for bisecting.

Then make all values configurable, e.g. take a closer look at 
ttm_memory.c. Just add attributes directly under the memory_accounting 
directory (see ttm_mem_global_init).

Additional to that you can't put device specific information (the 
no_retry flag) into ttm_mem_global, that is driver unspecific and won't 
work like this.

Move the new call out of ttm_mem_global_reserve() and into 
ttm_page_alloc.c or ttm_page_alloc_dma.c (but keep it in ttm_memory.c). 
ttm_mem_global_reserve() is called for each page allocated and 
si_mem_available() is a bit to heavy for that.

Maybe name TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY something like 
_FORCE_ALLOCATION or _ALLOW_OOM.

And please also try if a criteria like (si_mem_available() + 
get_nr_swap_pages()) < limit works as well. This way we would have only 
a single new limit.

Regards,
Christian.

Am 06.02.2018 um 10:04 schrieb Roger He:
> currently ttm code has no any allocation limit. So it allows pages
> allocatation unlimited until OOM. Because if swap space is full
> of swapped pages and then system memory will be filled up with ttm
> pages. and then any memory allocation request will trigger OOM.
>
>
> the following patches is for prevent OOM triggered by TTM.
> the basic idea is when allocating TTM pages, check the free swap space
> firt. if it is less than the fixe limit, reject the allocation request.
> but there are two exceptions which should allow it regardless of zone
> memory account limit.
> a. page fault
>     for ttm_mem_global_reserve if serving for page fault routine,
>     because page fault routing already grabbed system memory so the
>     allowance of this exception is harmless. Otherwise, it will trigger
>      OOM killer.
> b. suspend
>     anyway, we should allow suspend success always.
>
>
> at last, if bdev.no_retry is false (by defaut), keep the original behavior
> no any change.
>
> Roger He (5):
>    drm/ttm: check if the free swap space is under limit 256MB
>    drm/ttm: keep original behavior except with flag no_retry
>    drm/ttm: use bit flag to replace allow_reserved_eviction in
>      ttm_operation_ctx
>    drm/ttm: add bit flag TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY
>    drm/ttm: add input parameter allow_allo_anyway for ttm_bo_evict_mm
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |  4 +--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 10 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  8 +++--
>   drivers/gpu/drm/nouveau/nouveau_drm.c       |  2 +-
>   drivers/gpu/drm/qxl/qxl_object.c            |  4 +--
>   drivers/gpu/drm/radeon/radeon_device.c      |  6 ++--
>   drivers/gpu/drm/radeon/radeon_object.c      |  6 ++--
>   drivers/gpu/drm/radeon/radeon_object.h      |  3 +-
>   drivers/gpu/drm/ttm/ttm_bo.c                | 19 +++++++----
>   drivers/gpu/drm/ttm/ttm_bo_vm.c             |  6 ++--
>   drivers/gpu/drm/ttm/ttm_memory.c            | 51 ++++++++++++++++++++++++++---
>   drivers/gpu/drm/ttm/ttm_page_alloc_dma.c    |  1 -
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c         |  6 ++--
>   include/drm/ttm/ttm_bo_api.h                | 14 ++++++--
>   include/drm/ttm/ttm_memory.h                |  6 ++++
>   18 files changed, 111 insertions(+), 43 deletions(-)
>



More information about the amd-gfx mailing list