[PATCH] [RFC]drm/ttm: fix scheduling balance
Christian König
christian.koenig at amd.com
Thu Jan 25 10:07:08 UTC 2018
Am 25.01.2018 um 10:59 schrieb Chunming Zhou:
> there is a scheduling balance issue about get node like:
> a. process A allocates full memory and use it for submission.
> b. process B tries to allocates memory, will wait for process A BO idle in eviction.
> c. process A completes the job, process B eviction will put process A BO node,
> but in the meantime, process C is comming to allocate BO, whill directly get node successfully, and do submission,
> process B will again wait for process C BO idle.
> d. repeat the above setps, process B could be delayed much more.
NAK, this is intended behavior please don't change it. If you start to
use a per domain mutex like this no concurrent allocation is possible
any more.
In other words when process B needs a lot of memory and blocks for that
memory to be evicted it is intended that process C which doesn't needs
so much memory can jump in and execute before B.
> add a mutex to gerantee the allocation sequence for same domain. But there is a possibility that
> visible vram could be evicted to invisilbe, the tricky is they are same domain manager, so which needs a special handling.
To avoid lock inversion you would need a global mutex here, otherwise
you can run into the following issue:
Eviction: VRAM domain mutex locked first then GTT mutex.
Swapping in: GTT mutex locked first, then VRAM mutex.
Regards,
Christian.
>
> Change-Id: I260e8eb704f7b4788b071d3f641f21b242912256
> Signed-off-by: Chunming Zhou <david1.zhou at amd.com>
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 16 +++++++++++++++-
> include/drm/ttm/ttm_bo_api.h | 2 ++
> include/drm/ttm/ttm_bo_driver.h | 1 +
> 3 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d33a6bb742a1..f465f1d38711 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -969,7 +969,11 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
> if (mem_type == TTM_PL_SYSTEM)
> break;
>
> + if (ctx->man != man)
> + mutex_lock(&man->node_mutex);
> ret = (*man->func->get_node)(man, bo, place, mem);
> + if (ctx->man != man)
> + mutex_unlock(&man->node_mutex);
> if (unlikely(ret))
> return ret;
>
> @@ -991,6 +995,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>
> for (i = 0; i < placement->num_busy_placement; ++i) {
> const struct ttm_place *place = &placement->busy_placement[i];
> + struct ttm_operation_ctx busy_ctx;
>
> ret = ttm_mem_type_from_place(place, &mem_type);
> if (ret)
> @@ -1018,7 +1023,15 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
> return 0;
> }
>
> - ret = ttm_bo_mem_force_space(bo, mem_type, place, mem, ctx);
> + memcpy(&busy_ctx, ctx, sizeof(busy_ctx));
> + if (ctx->man != man) {
> + mutex_lock(&man->node_mutex);
> + busy_ctx.man = man;
> + }
> + ret = ttm_bo_mem_force_space(bo, mem_type, place, mem,
> + &busy_ctx);
> + if (ctx->man != man)
> + mutex_unlock(&man->node_mutex);
> if (ret == 0 && mem->mm_node) {
> mem->placement = cur_flags;
> return 0;
> @@ -1449,6 +1462,7 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
> man->io_reserve_fastpath = true;
> man->use_io_reserve_lru = false;
> mutex_init(&man->io_reserve_mutex);
> + mutex_init(&man->node_mutex);
> spin_lock_init(&man->move_lock);
> INIT_LIST_HEAD(&man->io_reserve_lru);
>
> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
> index 2cd025c2abe7..b08629050215 100644
> --- a/include/drm/ttm/ttm_bo_api.h
> +++ b/include/drm/ttm/ttm_bo_api.h
> @@ -265,6 +265,7 @@ struct ttm_bo_kmap_obj {
> * @no_wait_gpu: Return immediately if the GPU is busy.
> * @allow_reserved_eviction: Allow eviction of reserved BOs.
> * @resv: Reservation object to allow reserved evictions with.
> + * @man: record ctx is under man->node_mutex.
> *
> * Context for TTM operations like changing buffer placement or general memory
> * allocation.
> @@ -275,6 +276,7 @@ struct ttm_operation_ctx {
> bool allow_reserved_eviction;
> struct reservation_object *resv;
> uint64_t bytes_moved;
> + struct ttm_mem_type_manager *man;
> };
>
> /**
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 9b417eb2df20..d85d6ee4f54d 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -288,6 +288,7 @@ struct ttm_mem_type_manager {
> uint32_t default_caching;
> const struct ttm_mem_type_manager_func *func;
> void *priv;
> + struct mutex node_mutex;
> struct mutex io_reserve_mutex;
> bool use_io_reserve_lru;
> bool io_reserve_fastpath;
More information about the amd-gfx
mailing list