[PATCH 0/5] prevent OOM triggered by TTM
Christian König
christian.koenig at amd.com
Tue Feb 6 09:51:59 UTC 2018
Nice work, but a few comments.
First of all you need to reorder the patches. Adding the exceptions to
the restrictions should come first, then the restriction itself.
Otherwise we might break a setup in between the patches and that is bad
for bisecting.
Then make all values configurable, e.g. take a closer look at
ttm_memory.c. Just add attributes directly under the memory_accounting
directory (see ttm_mem_global_init).
Additional to that you can't put device specific information (the
no_retry flag) into ttm_mem_global, that is driver unspecific and won't
work like this.
Move the new call out of ttm_mem_global_reserve() and into
ttm_page_alloc.c or ttm_page_alloc_dma.c (but keep it in ttm_memory.c).
ttm_mem_global_reserve() is called for each page allocated and
si_mem_available() is a bit to heavy for that.
Maybe name TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY something like
_FORCE_ALLOCATION or _ALLOW_OOM.
And please also try if a criteria like (si_mem_available() +
get_nr_swap_pages()) < limit works as well. This way we would have only
a single new limit.
Regards,
Christian.
Am 06.02.2018 um 10:04 schrieb Roger He:
> currently ttm code has no any allocation limit. So it allows pages
> allocatation unlimited until OOM. Because if swap space is full
> of swapped pages and then system memory will be filled up with ttm
> pages. and then any memory allocation request will trigger OOM.
>
>
> the following patches is for prevent OOM triggered by TTM.
> the basic idea is when allocating TTM pages, check the free swap space
> firt. if it is less than the fixe limit, reject the allocation request.
> but there are two exceptions which should allow it regardless of zone
> memory account limit.
> a. page fault
> for ttm_mem_global_reserve if serving for page fault routine,
> because page fault routing already grabbed system memory so the
> allowance of this exception is harmless. Otherwise, it will trigger
> OOM killer.
> b. suspend
> anyway, we should allow suspend success always.
>
>
> at last, if bdev.no_retry is false (by defaut), keep the original behavior
> no any change.
>
> Roger He (5):
> drm/ttm: check if the free swap space is under limit 256MB
> drm/ttm: keep original behavior except with flag no_retry
> drm/ttm: use bit flag to replace allow_reserved_eviction in
> ttm_operation_ctx
> drm/ttm: add bit flag TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY
> drm/ttm: add input parameter allow_allo_anyway for ttm_bo_evict_mm
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 +++--
> drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +-
> drivers/gpu/drm/qxl/qxl_object.c | 4 +--
> drivers/gpu/drm/radeon/radeon_device.c | 6 ++--
> drivers/gpu/drm/radeon/radeon_object.c | 6 ++--
> drivers/gpu/drm/radeon/radeon_object.h | 3 +-
> drivers/gpu/drm/ttm/ttm_bo.c | 19 +++++++----
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 6 ++--
> drivers/gpu/drm/ttm/ttm_memory.c | 51 ++++++++++++++++++++++++++---
> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1 -
> drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 6 ++--
> include/drm/ttm/ttm_bo_api.h | 14 ++++++--
> include/drm/ttm/ttm_memory.h | 6 ++++
> 18 files changed, 111 insertions(+), 43 deletions(-)
>
More information about the dri-devel
mailing list